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1. INTRODUCTION 


The Administrator Guide is a reference volume for those who administer a 
UNIX system. The guide should be used to supplement the information 
contained in the Sys5 UNIX User Reference Manual, the Sys5 UNIX 
Programmer Reference Manual, and the Sys5 UNIX Administrator 
Reference Manual. The following paragraphs contain a brief description of 
each chapter of the guide. 


The chapter “ADMINISTRATIVE ADVICE” contains helpful advice and 
suggestions for administrators of the UNIX system. 


The chapter “SETTING UP THE UNIX SYSTEM” describes the setup 
procedures for installing the Plexus Sys5 UNIX operating system. 


The chapter “AUTO CALL FACILITY INSTALLATION” outlines the 
installation procedures for a properly installed (software) automatic call-up 
facility. 

The chapter “UNIX SYSTEM ACCOUNTING” descibes the structure, 
implementation, and management of the accounting system. 


The chapter “FILE SYSTEM CHECKING” describes the file system check 
program (fsck) of the UNIX system. Fsck audits and interactively repairs 
inconsistency in the file system. 


~The chapter “LP SPOOLING SYSTEM” defines the LP program and 


describes the role of the LP administrator in performing restricted functions 
and overseeing the smooth operation of LP. 


The chapter “VIRTUAL PROTOCOL MACHINE” defines the Plexus virtual 
protocol machine (VPM) and describes the implementation and the 
administrative duties. 


The chapter “UNIX SYSTEM REMOTE JOB ENTRY” defines the UNIX 
system remote job entry (RJE) and describes the administrative duties of an 
RJE administrator. 


The chapter “UNIX SYSTEM ACTIVITY PACKAGE” describes the design 
and implementation of the UNIX system activity package. The package 
reports UNIX system-wide statistics. | 


The chapter “UUCP ADMINISTRATION” describes how a uucp network is 
set up, the format of the control files, and administrative procedures. 
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2. ADMINISTRATOR’S ROAD MAP 


This chapter contains administrative advice based on the experience and 
suggestions of many system administrators. Other reasonable approaches 
may be taken to solve many of the problem areas described. 


Getting started as a UNIX system administrator is hard work. There are no 
real shortcuts to a working knowledge of the system. The system 
administrator will need time for reading, studying, and hands-on 
experimenting. The system administrator should not go “live” with the 
system until he/she have had several weeks to learn the job and get the 
initial hardware quirks ironed out. 


The administrator should be familiar with a lot of the distributed 
documentation. The “Introduction” and “How to Get Started” sections of the 
Sys5 UNIX User Reference Manual as well as all of the sections of the 
Sys5 UNIX Administrator Reference Manual should be studied. 


Throughout this chapter, each reference of the form name(1M), name(7), or 
name(8) refers to entries in the Sys5 UNIX Administrator Reference 
Manual. References to entries of the form name(N), where "N" is the 
number 1 or 6 possibly followed by a letter, refer to entry name in section N 
of the Sys5 UNIX User Reference Manual. \f "N" is a number 2 through 5 
possibly followed by a letter, refer to entry name in section N of the Sys5 
UNIX Programmer Reference Manual. 


In these manuals, pay special attention to: acct(1M), checkall(1M), 
chmod(1), chown(1), config(1M), cpio(1), date(1), dcopy(1M), df(1M), 
don(1M), du(1), ed(1), env(1), errpt(1M), find(1), format(1M), fsck(1M), 
fuser(1M), kill(1), mail(1), mkdir(1), mkfs(1M), ncheck(1M), ps(1), rm(1), 
rmdir(1), shutdown(1M), stty(1), su(1), sync(iM), time(1), vef(1M), 
voicopy(1M), wall(1M), who(1), and write(1); acct(4); all of section 7; and 
crash(8), 750ops(8), 7800ps(8), and 70boot(8). 


2.1 DISK FREE SPACE 


Making files is easy under the UNIX operating system. Therefore, users 
tend to create numerous files using large amounts of file space. It has been 
said that the only standard thing about all UNIX systems is the message-of- 
the-day telling users to clean up their files. Administratively, both free disk 
blocks and free inodes (UNIX system talk for file headers) can be a 
problem. If the free inode count falls below 100, the system spends most of 
its time rebuilding the free inode array. If a file system runs out of space, 
the system prints ‘“‘no-space” messages and does little else. To avoid 
problems, the following start-of-day free counts should be maintained: 
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e The file system containing /tmp (temporary files): 


- 16-user system: 1500 free kilobytes (KB). 
- 40-user system: 3000 free KB. 


e The file system containing /usr: 
- 3000 to 6000 free KB depending on load. 
e Other user file systems: 


-6to 10 percent free depending on user habits 
(3000 KB minimum). 


This brings up an associated problem: how big should file systems be? The 
preference is to set aside space on each drive for a copy of root/swap and 
use the rest of the pack for a single file system. However, if you have user 
groups that fight over disk space, it may be better to split them up arbitrarily 
(i.e., divide a pack into more than one file system). If different disk drives 
are set up with differing cylinder partitions between file systems, it will 
eventually lead to an operational blunder. 


2.2 A FEW WORDS ABOUT SYSTEM TUNING 


A file system reorganization can help throughput but at the expense of down 
_ time. If the reorganization is done during nonprime time, it can help. 


If normal shutdown and filesave procedures are used, the file system check 
program [fsck(1M), —S option] will help keep the disk free list in reasonable 
order. Try to keep disk drive usage balanced. If there are over 20 users, 
the root file system (/bin, /tmp, and /etc) deserves a drive of its own. If 
there is a noisy modem (poorly executed do-it-yourself null-modem) or a 
disconnected modem cable, the UNIX system will spend a lot of CPU time 
trying to get it logged in. A angen check of systems uncovers a lot of this 
going on. 


2.3 WHY A SPARE DISK DRIVE IS NEEDED 


Without a spare disk drive, the system will be down when a drive is down. 
Also, without the spare drive, it is difficult to reorganize file systems or to 
save and restore user files. 


2.4 DISK PACKS 


Only fully ECC (Error Correcting Code) correctable disk packs should be 
bought. The pack should be tested; and if uncorrectable errors develop, 
recondition the pack or get rid of it. 


RP0O6 disk packs used with the UNIX system need not be totally error free 
but must be “flag-free’”. The term flag-free means that there should be no 
unrecoverable ECC errors. Technically, proper ECC handling can recover 
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* from 11-bit error bursts. However, the length of bursts can grow as a pack 
ages. It is recommended that no pack that has more than 8-bit error bursts. 


2.5 PROTECTING USER FILES 


Users, especially inexperienced ones, occasionally remove their own files. 
Open files are sometimes lost when the system crashes. Once in a great 
while, an entire file system will be destroyed (picture a disk controller that 
goes bad and writes when it should read). Here is a suggested file backup 
procedure: 


e Each day copy all user file systems to backup packs. Keep these 
packs 3 to 5 days before reusing them. 


e Once a week copy each file system to tape. Keep weekly tapes for 8 
weeks. 


e Keep bimonthly tapes “forever” (they should be recopied once a 
year). 


The most recent weekly tapes should be kept off premises. The other tapes 
should be in a fireproof safe if available and not too expensive. 


When the UNIX system goes down, active files can get scrambled. Your 

a users will not want to start the day over every time the system fails. In 

( addition to good backup, you must have file system patching expertise 

. available (on-site or on-call). If the system is ever rebooted for general use 

without first checking the file systems, terrible things will happen. Study 

checkall(1M), fsck(1M), and crash(8) as well as the “File System 
Checking” chapter for more information. 


2.6 FILE SYSTEM BACKUP PROGRAMS 
The following backup programs are distributed: 


e Find/cpio: The UNIX system is distributed in cpio format. The -cpio 
option of the find command can be used for saving only those files 
that have changed or been created over a definite period. 


Volicopy: Physical file system copying to disk or tape. For those with 
a spare drive, volcopy to disk provides convenient file restore and 
quick recovery from disk disasters. Tape volcopy provides good 
long-term backup because the file system can be read-in fairly 
quickly, mounted, and browsed over. Disk and tape volcopy are 
generally used together for short- and long-term backup. Note that a 
volcopy from a mounted file system may result in an inconsistent 
copy (files being written at the time can contain invalid data). 


~~. Figure 2-1 summarizes attributes of these programs. In the figure, the file 
| system size is 65,500 KB in all cases; times are in minutes; judgements are 
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| | FINDICPIO | VOLCOPY (DISK) | VOLCOPY (TAPE 


Full dump time 
Incremental dump time 
Full restore time 
Incremental restore time 
Ease of restoring: 


one file good 

a directory good 

scattered files - good 

full restore very good 
Needs tape drive no 
Needs spare file system 

(two CPUs can share) yes 


Maintains pack/tape labels yes 
Handles multireel tape 
512 KB per record 
Interactive 

(i.e., ties up console) 
May require sep arate 

I/D space 


* KB per record are cut to 22 without separate I/D space. 
Figure 2-1. File System Backup Programs 


subjective. 


The spare disk drive is strongly recommended. The speed and convenience 
of volcopy are by no means the only advantage of a spare drive. It is 
strongly recommended that the administrator modify the /etc/filesave and 
/etc/checklist files to meet the operational needs and update the local 
operator's manual accordingly. Remember, the more the administrator 
automates and documents operational procedures the less downtime will be 
encountered. 


2.7 CONTROLLING DISK USAGE 


If the UNIX system is a success, disk space will soon become limited. 
During the long delay before more drives become available, usage should 
be controlled. Try to maintain the start-of-day counts recommended. Watch 
usage during the day by executing the df(1) command regularly. 


The du(1) command should be executed (after hours) regularly (e.g., daily), 
and the output kept in an accessible file for later comparison. In this way, 
users rapidly increasing their disk usage may be spotted. This can also be 
accomplished by running the accounting system's acctdusg program [see 
acct(1M)] as shown in “The Sys5 UNIX Accounting” chapter. , 


The find(1) command can be used to locate inactive (or large) files. For 
example: 
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find /-—mtime +90 —atime +90 —print >somefile 


records in somefile the names of files neither written nor accessed in the 
last 90 days. 


The administrator will also have to balance usage between file systems. To 
do this, user directories must be moved. Users should be taught to accept 
file system name changes (and to program around them—preferably ahead 
of time). The user’s login directory name (available in the shell variable 
HOME) should be utilized to minimize pathname dependencies. User 
groups with more extensive file system structures should set up a shell 
variable to refer to the file system name (e.g., FS). 


The find(1) and cpio(1) commands can be used to move user directories 
and to manipulate the file system tree. The following sequence is useful (it 
moves the directory trees userx and usery from file system filesys7 to file 
system filesys2 where, presumably, more space is available): 


cd /filesys1 

find userx usery —print | cpio —pdm /filesys2 

Make sure new copy is OK. 

Change userx and usery login directories 
in the /etc/passwd file. 

Notify userx and usery via mail(1) that 
they have been moved and that pathname 
dependencies in their .profile and shell 
procedures may need changed. See the 
discussion on $HOME above. 

rm —f /filesys1/userx /filesys1/usery 


HF HH HHH H 


When moving more than one user in this way, keep users with common 
interests in the same file system (these users may have linked files) and 
move groups of users who may have linked files with a single cpio 
command (otherwise linked files will be unlinked and duplicated). 


2.8 REORGANIZING FILE SYSTEMS 


There is a new file system reorganization utility called dcopy(1M). On an 
otherwise idle system, a reorganized file system has almost twice the I/O 
throughput of a randomly organized file system. This applies to file copying, 
finds, fscks, etc. Dcopy can take up to 2.5 hours to initially reorganize 
(copy) a large file system. During reorganization, the system can be up, but 
the file system being copied must be unmounted. 
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For those who can afford the operator time, root reorganization once a week 
(requires system reboot) and user file system reorganization once a month 
will improve system performance. Dcopy is an interim step. 


2.9 KEEPING DIRECTORY FILES SMALL 


Directories larger than 5K bytes (320 entries) are very inefficient because of 
file system indirection. A UNIX system user once complained that it took 
the system 10 minutes to complete the login process; it turned out that his 
login directory was 25K bytes long, and the login program spent that time 
fruitlessly looking for a nonexistent .profile -file. A large /usr/mail or 
lusr/spool/uucp directory can also really slow the system down. The 
following will ferret out such directories: 


find / -type d —size + 10 —print 


Removing files from directories does not make the directories get smaller 
(the empty directory entries are available for reuse). The following will 
“compact” /usr/mail (or any other directory): 


mv /usr/mail /usr/omail 

mkdir /usr/mail 

chmod 777 /usr/mail 

cd /usr/omail 

find . —print | cpio —pim ../mail 
cd.. 

rm —f omail 


2.10 ADMINISTRATIVE USE OF “CRON” 


The program cron(1M) is useful in the administration of the system; it can 
be used to: 


e Turn off the programs in directory /usr/games during prime time. 


e Run programs off-hours: 
- accounting; 
- file system administration; 
- long-running, user-written shell procedures. 


2.11 WATCH OUT FOR FILES AND DIRECTORIES THAT GROW 


Most of the below files are restarted automatically by entries in /etc/rc at 
system reboot. 3 


e Accounting files: 


e /etc/wtmp—login information; grows extremely fast with 
terminal line difficulties; use acctcon(1M) to determine the 
offending line(s). 
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° /usriadm/pacct—per process accounting records; gets big 
quickly; monitored automatically by ckpacct from cron(1M). 


e /usr/lib/cron/iog—status log of commands’ executed by 
cron(iM); also watch this file for error messages from the 
programs being executed in /usr/spool/cron/crontab/*. 


e /usr/adm/errfile—hardware error logging info; also read login 
adm’s mail periodically. 


e /usr/adm/ctlog—a log of the people who use ct(1C) command. 


e /usr/adm/sulog—a log of those who execute the superuser 
command. 


e /usr/adm/Spacct—process accounting files left over from an 
accounting failure; remove these files unless the accounting 
files that failed are to be rerun. 


e Other files: 


e /usr/spool—spooling directory for line printers, uucp(1C), etc., 
and whose subdirectories should be compacted as described 
above. 


2.12 ALLOCATING RESOURCES TO USERS 


A prospective user should first obtain authorization to use the system and 
then apply for a login by providing the following information to the “system 
administrator’: 


e User's name. 


e Suggested login name (not more than eight characters, beginning 
with a lowercase letter and not containing special or uppercase 
letters). " 


e Relationships to other users (this influences the choice of the file 
system). 


e Estimate of required file space (this also influences the choice of the 
file system) and connect hours. This aids in hardware growth 
planning. 


Users must have passwords with at least six characters. (Only the first eight 
characters are significant.) Also, every password must have at least two 
alphabetic characters and one numeric or special character. The password 
must differ from the user's login name and any reverse or circular shift of it. 
Refer to passwd(1) and passwd(4) for more information on password 
selection and password aging. 
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2.13 THE MATTER OF ACCOUNTING AND USAGE 


You should run the accounting programs even if there is not a “bill” for 
service. Otherwise, users’ habits (especially bad habits) will be a mystery to 
you. Accounting information can also help you find performance 
bottlenecks, unused logins, bad phone lines, etc. 


2.14 DIAL-LINE UTILIZATION 


If prime-time dial-line utilization gets much over 70 percent, users will start 
to encounter busy signals when dialing in. This, in turn, will lead to “line 
hogging”. The only solutions are to acquire more dial-up ports, get a larger 
(another) machine, or to get rid of users. Manual policing will help some, 
but “automatic” policing will be invariably subverted by users. 


2.15 “BIRD-DOGGING” 


When the system is busy (lines busy and/or slow response), someone 
should determine why this is so. The who(1) command lists the people 
logged in. The ps(1) command shows what they are doing. Unfortunately, 
ps operates from heuristics that can consistently fail to report certain 
processes in a busy system. That is, one must be careful about hanging up 
an apparently inactive line. The acctcom(1M) command can read the 
process accounting file /usr/adm/pacct backwards from the most recent 
entry. It will print entries for selected lines or login names. | 


2.16 TERMINALS 


Do not use uppercase only terminals. Use _ full-duplex, full-ASCIl 
asynchronous terminals. Hardware horizontal tabbing is very desirable 
because it increases output speed and lowers system overhead. A fair 
proportion of the terminals should provide for correspondence-quality hard 
copy output to take advantage of the UNIX system word processing 
capabilities; see term(5). 


2.17 LINE PRINTERS 


Most line printers are troublesome and impose considerable overhead on 
the system. Most also lack hardware tabs, character overstrike capability, 
etc. A printer that will work over an asynchronous link (DC1/DC3 protocol 
required) is the best bet. 


2.18 SECURITY 


The current UNIX operating system is not tamperproof. The system 
administrator cannot keep people from “breaking” the system but can 
uSually detect that they have done so. The following command will mail (to 
root) a list of all “set user ID” programs owned by root (superuser): 
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find / —user root perm —4100 —exec Is -| {} \; | mail root 
Any surprises in root’s mail should be investigated. In dealing with security, 


e Change the superuser password regularly. Do not pick obvious 
passwords (choose 6-to-8 character nonsense strings that combine 
alphabetics with digits or special characters). 


e Dial ports that do not require passwords usually cause trouble. 


e The chroot(1M) and su(1) commands are inherently dangerous as 
are group passwords. 


e Login directories, .profile files, and files in /bin, /usr/bin, /Ibin, and 
fetc that are writable by others than their respective owners are 
security weak spots; police the system regularly against them. 


e Remember, no time-sharing system with dial ports is really secure. 
Do not keep top secret information on the system. 


2.19 COMMUNICATING WITH THE USERS 


The directory /usr/news and the news(1) command are provided as a way 
to get “brief” announcements to your users. More pressing items (one- 
liners) can be entered in the /etc/motd (message of the day) file; motd and 
(new to the user) news are announced at login time. 


To reach users who are already logged in, use the wall({1M) (write all) 
command. Do not use wall while logged-in as superuser, except in 
emergencies. 


The /usrinews directory should be cleaned out once a week by removing 
everything older than 2 months. It has been found that on most systems a 
file in /usr/news will reach 50 percent of the users within a day and over 80 
percent within a week; motd should be cleaned out daily. 


2.20 TROUBLESHOOTING 


It would be easy to write a book on troubleshooting. The following is some 
effective advice in dealing with troubles. In dealing with hardware support 
service personnel, 


e Be sure that the contractor agrees to get along with the UNIX 
software before you take out a hardware service contract (“It's the 
hardware,” says you; ‘It’s the software,” says the hardware service 
contractor). 


e Keep on top of problems. Find out about any such scheme that your 
contractor may have and make them prove that it is being followed. 
Remember that an unreported problem is getting no priority at all. If 
a problem persists, escalate it through the contractor's local 
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management chain; it may also be effective to complain to the 
contractor's sales representative. 


For effective service, an extended period support service offering 
(e.g., 16 hours/day, 6 days/week) should be provided. Arrange for 
preventive maintenance, noncritical repair, and add-on installation 
work to be done before or after prime time. 


e Know the details of the support service offering applicable to the 
installation. In particular, make sure that preventive maintenance is 
scheduled in advance and that it is completed. 


e A “site log” should be maintained for the hardware. All troubles 
should be recorded in the log by the support service personnel and/or 
the operating personnel. 


e Run error logging and maintain console sheets. Make sure error 
messages are shown to support service personnel. 


e Take core dumps after system crashes and have them available for 
support service personnel. 


e Keep records of downtime and make sure that support service 
personnel know about them. | 


Telephone problems are most apt to occur when rearranging or adding 
equipment. Occasionally, central office, trunking, or modem failures occur. 
In dealing with the telephone services vendor, 


e Be specific with repair operators. Tell the operators that the trouble 
involves data equipment. 


e If the first call fails to get results, ask for the “supervisor” on the 
second call, and if necessary, escalate further to get the problem 
solved. 


Some of the obvious problem areas are: 


e Disk Drives—Over 50 percent of the problems are likely to be related 
to the disk subsystem. As mentioned earlier, the way to keep the 
system up is to have a spare disk drive. Remember that preventive 
maintenance of disk drives is very important. Make sure that the 
support service personnel who service the hardware see the error- 
logging printouts and console error messages produced by the UNIX 
system (and that they understand them). Disk failure can ruin a file 
system. The only defense is to make a complete, daily file backup! 
(See the part “Protecting User Files”.) 


e Dial Ports—in the dial-in interface area, there is room for finger- 
pointing among all involved vendors. Check for obvious things such 
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as is the system in “multiuser” mode, is the /etc/inittab file OK, or are 
any cables loose (both ends)? In some telephone offices, trunk 
hunting is based on 10-number groups. Hunting between such 
groups can fail independently of anything else. The possibilities for 
trouble are many. Figure 2-2 attempts to describe some alternatives; 
it is meant primarily for users of the DH11/D2Z11 asynchronous 
devices. As an example of the format, (vertical) Rule 3 reads: “If line 
rings and ring light shows and computer does not answer and 
switching the modem solves the problem, then it is likely to be a 
telephone company problem; also, busy out that line.” 


Synchronous Ports—High-speed synchronous interface devices are 
even more trouble than dial equipment. The following is a list of 
potential trouble spots: 


—The UNIX system software. 
—Interface device (e.g., KMC11B). 
—Cable to the modem. 

—The modem. 

—The communications line. 
—Other modem. 

—QOther cable. 

—Other interface device. 

—Other system's software. 


Rules: 123 45 67 8 9 0 
Condition: 


Line rings 

Ring light shows on telephone console 

Computer answers 

Login message received on terminal 

Switching modem solves problem 
User canlogin - - ~ - —- - 


|<!<<<< 


Telephone console shows data received - —- —- - — —- 
Problem affects whole DH/DZ - - - - -— - 
Diagnosis and/or Action: 
No problem - -~- - - —~ - - - - 
Processor hardware problem likely 
Telephone problem likely 
May be a problem with user's terminal - —- - - —- - - 
Busy out bad line(s) xX X X XK X X 


Figure 2-2. Asynchronous Line Problems 


Z2<Z21<<<«<< 
1-2ai<x~~<<-< 


Power Supply Modules—There are a lot of them, and they do fail, 
more or less regularly. Hard failure can be detected at the console; 
voltage drift is tougher. 
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2.21 DATA SET OPTIONS 
The following data set options seem to work with the UNIX system: 


The 801C-L1 (Auto-Call Unit): 
Jumpers: 
E2toE3 
E6 to E5 


Options: 
Y, X, T, B, 
ZG, ZP, G, 
R, ZT 


Switches (0 = open, 1 = closed, i.e., side next to number is down): 
S1 = 1000[1] (Bracketed switches are missing on some models.) 
S2 = 0101 | 
S3 = 11010 
S4 = 11[00] 


The 212A-L1 (1200-baud full duplex): 
Options: 
E, ZF, YF, YC, 
YG, YJ, YK, 
S, V, A, T, ZH, 
W, YP, YR 


Switches: 
$1 = [0]001 
S2 = 110001000 
S3 = 11110000 (10100000 on 212AR-L1) 
S5 = 00 


2.22 NULL MODEM WIRING 


Improperly wired null modems can cause spurious interrupts, especially at 
higher baud rates. A single bad modem on a 9600-baud line can waste 15 
percent of your CPU power. The following (symmetrical) wiring plan will 
prevent such problems: 
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pin 1 to 1 

pin 2 to 3 

pin 3 to 2 

strap pin 4 to 5 in the same plug 
pin 6 to 20 

pin 7 to 7 

pin 8 to 20 

pin 20 to 6 and 8 

ground unused pins 


2.23 113D, 103J DATA SET PROBLEMS 


The DH11 and DJ11 multiplexers normally have a jumper connecting pin 25 
to pin 4 (request to send), thus asserting pin 25 when the line is opened. 
This jumper should be removed for any lines connected to 113Ds or 103Js 
(also applies to 103Js with 801s). 
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3. SETTING UP UNIX 


This chapter describes the load and upgrade procedures for the Plexus 
implementation of Sys5 UNIX. The Plexus UNIX Sys5 consists of: 


e a release tape (cartridge or 9-track), 
e this release document (98-40126.0 Ver. F). 


The Release Tape comprises 22 files. Files 0-19 are blocked at 1024 bytes 
per record; file 20 is blocked at 10240 bytes per record; files 21 through the 
end of the tape are blocked at 5120 bytes per record. Most of the tape files 
are Release 5.21 standalone programs. These are for backup and 
emergency purposes, in case the disk copies of the standalones become 
inaccessible and you need to run the standalone programs from tape. File 
20 is a dump of a bootstrap minimum file system release 5.21 system. Files 
21,22, and 23 are cpio format files comprising the full release. File 21 
contains everything except the /usr/catman and /usr/man directories. 
These directories are in files 22 and 23. 


3.1 Reloading Procedures 


This chapter gives procedures for the basic steps required to reload Sys5. 
Remember to follow these procedures only if 


1. your system has a new primary disk and system software must all be 
reloaded; or 


2. your system has experienced a catastrophic failure such that the 
system software is lost and you do not have a dump backup of your 
operating system. 


The first section of this chapter is a checklist to reload Sys5. Each 
subsequent section of this chapter corresponds to one item in the checklist. 


3.1.1 Checklist to Reload Sys5.21 


This section gives a checklist to reload Sys5.21. Each step in this checklist 
is described in a separate subsection below. 


1. 
2. 
3.1.1.1 Install the Release Tape 


Follow these directions to load the system software onto a new disk. This 
procedure destroys any previous contents of the disk. 


Install the Release Tape and run mkfs(1M). 


Start up the system and install the full release. 


To load the tape: 


1. Turn on system power. Press reset button. 
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2. Wait for "PLEXUS SELFTEST REV X.X COMPLETE" message. The < © 
system informs you about the disk and tape driver names in use on «~ 
your system (e.g., pd, pt), tells you about the various boards (e.g., 
Ethernet, ICPs), and tells the memory size. Then the boot message 
appears. The boot message is "PLEXUS PRIMARY BOOT REV X.X". 
After the boot message comes the ":" prompt. 


3. The disks come preformated from the factory. In the event of a major 
catastrophe you will be required to reformat the disks. See the Plexus 
User's Manual for instructions on how to do this if necessary. 


4. Make a file system on the disk with the standalone program mkfs. To 
do this, mount the Release Tape and respond as indicated in bold 
below. The file system size is given in 1024-byte blocks. 


NOTE: mkfs DESTROYS THE DATA ON THE DESIGNATED 
FILE SYSTEM, SO USE CAUTION!!! 


The sequence is 


: mkfs 


$$ mkfs /dev/dsk/Os1 isize m 500 | 
isize = <some number -- varies according to file system s 
m/n = 1 500 ~ 


where "m” is the interlace factor. See below for your interlace factor. 


IMSP 

all 1 
Xylogics (xy) 

8" 142.6M-byte NEC 7 
14" 276.8M-byte Fujitsu 7 
14" 562M-byte Fujitsu 12 


If you have not changed the size of /dev/dsk/Os1, “isize” is 18000 
(16000 for 22 Mbyte disks). When the mkfs is finished, the system 
prints the message “Exit 0”. 


5. Restore the bootstrap, “minimum” file system onto the disk. Use the 
standalone program restor from the Release Tape. This loads the 
files: . 


on 


Na it : 
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/bin 

/bin/cpio 
/bin/date 
/bin/echo 
/bin/env 
/bin/login 
/bin/Is 
/bin/mkdir 
/bin/pwd 
/bin/sh 
/bin/su 
/bin/sync 
/bin/uname 
/dev 
/dev/console 
/dev/dsk 
/dev/dsk/OsO 
/dev/dsk/Os1 
/dev/dsk/Os2 
/dev/error 
/dev/rdsk 
/dev/rdsk/OsO 
/dev/rdsk/Os1 
/dev/rdsk/Os2 
/dev/rtp 
/dev/rfp/OsO 
/dev/rmt 
/dev/rpt 
/dev/rpt/Om 
/dev/rpt/Omn 


/dev/rrm 
/dev/rrm/Om 
/dev/rrmiOmn 
/dev/rrm/Ohm 
/dev/rrm/Ohmn 
/dev/swap 
/dev/syscon 
/dev/systty 
/dev/tty 

/etc 

/etc/fsck 
/etc/fsdb 
/etc/getty 
/etc/grpck 
/etc/init 
/etc/inittab 
/etc/mkfs 
/etc/mknod 
/etc/mount - 
/etc/passwd 
/etc/rc 
/etc/umount 
/imp — 

/UNIX 

/uSr 

/usr/plx 
/ust/plx/tape 


CHAPTER 3 


These are the minimum files necessary to boot /unix and load the rest 
of the system from file 21. 


NOTE: To abort the process, hit the reset button. 


The sequence is: (Your response is in bold. ) 


: restor 


$$ restor rf rrr /dev/dsk/Os1 +20 
Spacing forward 20 files on tape 


where rrr is the media device. For the 9-track tape use /dev/rrm/Om 
and for the cartridge use /dev/rpt/Om. 


The final remark from the restor program before it commences to 
restore the file system is 
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Last chance before scribbling on /dev/dsk/0s1. 
Respond with a <return> when you are ready to begin the restor. 


Note that the restor is not done. Do not remove the tape or reset the 
machine until you see this message. Wait until you see the message 
“Exit O”. 

3.1.1.2 Startup and Install the Release 


Follow the procedures to bring-up your system (see section 2-1) and enter 
init state 1. 


Now load the contents of file 21 (the full file system) from the Release Tape. 
For 9-track tape, issue the commands: 


/usr/plx/tape srcheof 21 
cpio -idmvB < /dev/rrm/Om loads file 21 
sync 


For cartridge tape, issue the commands: 


/usr/plx/tape -f /dev/rpt/Om srcheof 21 
cpio -idmvB < /dev/rpt/Om loads file.21 
sync 


Finally, install the files currently in use (/bin/sh, bin/cpio, and /etc/init). Run 
the shell procedure /tmp/release/fixup as follows: 


/tmp/release/fixup 

<control> d 

sync 

sync 
Reboot according to the instructions in the Plexus User’s Manual. After the 
system returns to the init state 1, issue the command: 

rm -rf /imp/release 

sync 


The last required step to reload Sys5 is to go to multi-user state. 
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4. ACU 


Before using the procedures outlined in this chapter, the administrator 
should be familiar with the procedures and commands used to install a UNIX 
system and should have read the “SETTING UP THE UNIX SYSTEM” 
chapter of this guide. 


The UNIX system has the capability to utilize an Auto Call Unit (ACU) for 
setting up a facility for dialing-up and communicating with other systems. 
The user level interface for the device is the command cu. The system may 
also use the ACU for networking via the uucp(1C) programs [See cu(1C) 
and uucp(1C) in the Sys5 UNIX User Reference Manual]. A system 
administrator, however, must first make certain software adjustments to the 
system to enable the device to be used. These adjustments are listed 
below. 


4.1 Procedures 


The following step-by-step procedures should ensure a properly software 
installed automatic call-up facility. When installing more than one facility, 
adjust the instructions accordingly. An ACU interface (DN-11 line card), an 
801-type ACU, and a modem per facility 


1. The ACU must first be optioned correctly and associated with a 
dedicated dial-in port. For the following examples, /dev/tty00 will be 
the dedicated dial-in port. 


The 801-type ACU contains the following switches and should be 
optioned as shown below. A 0 = open and a 1 = closed (i.e., side 
next to number is down); bracketed switches are missing on some 


models. 

S1 = 1000[1] 

S2 = 0101 

S3 = 11010 

S4 = 11[00] 

When using 212-type data sets, ensure the optioning of these also. 
S1 = [0]001 | 


S2 = 110001100 
$3 = 11110000 
S4 = 00 


When using the shared ACU facility, the 57B1 data unit will have to be 
properly optioned. If the 57B1 data unit is the only sharing circuit in an 
arrangement or the first in an arrangement using two data mountings, 
all the positions of switches S9 and S10 must be open. If the 57B1 is 
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the second sharing circuit in an arrangement using two data 
mountings, all positions of switches S9 and S10 must be closed. 
There are no customer selected options. 


All make-busy (MB) and service-line (SL) switches on the data 
mountings should be in the up or off position (i.e., do not busy them 
out). The make-busy switches on the 57Bis should be in the off 
position for the slots with data sets and in the make-busy position for 
unequipped or unused slots. 


To use the 212A data set at high speed, the high-speed button must 
be depressed. This is not software switchable. If high speed is 
chosen, any UNIX system called must have 1200-baud answer 
Capability or connection will not be possible. 


Edit /etc/inittab and turn the getty process off for multiuser state (e.g., 
state 2) by changing the flags field to off. 


00:2:off:/etc/getty ttyOO 1200 # 
This will not allow anyone to dial into this dedicated port. 


Include the ACU interface device driver entry in your system 
description file with the appropriate unit information; for example: 


dnii 370 175200 4 1 | 
Note the major device number generated by config -t. 


Using the major device number, make the following device entries in 
the /dev directory. The name dn0 is commonly used with UNIX 
systems. 3 i | 


# mknod /dev/dn0 c 7 0 
# In /dev/dnO /dev/cua0 
# In /dev/ttyO0 /dev/culO 


The linked names of cu70 and cua0 are simply aliases of the access 
and line devices that are known by the cu and uucp commands. 


Some configurations may incorporate a shared ACU capability (57A1 
and 57B1 data units) that enables one ACU interface device and one 
ACU to dial numbers and establish connections for up to 12 phone 
lines (data sets). 


In this configuration, the data sets are located in two mounting 
racks—six to a rack. The data sets are numbered 1 through 6 in the 
first rack and 9 through 14 in the second rack. The data set number 
should be reflected in the high order 4 bits of the minor device number 
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( : of the ACU interface device entries in /dev. 
As a comparison, to install four ACU facilities on a UNIX system would 
have previously required four DN-11 line cards, four 801-type ACUs, 
four data sets, and the following nodes: 


ENTRY Minor Device # 


/dev/dn0 0 
/dev/dnt 1 
/dev/dn2 2 
/dev/dn3 3 


With the shared ACU facility, only one DN-11 line card and one ACU 
are required. Up to 12 data sets can be serviced and their nodes 
would be: 


ENTRY Minor Device # 
/devidnO 020 


( 
/dev/dn1 040 (32) 
/dev/dn2 060 (48) 
/dev/dn3 0100 (64) 
a /dev/dn4 0120 = (80) 
( /dev/dn5 0140 = (96) 
idev/dn6 0220 (144) 
/dev/dn7 0240 (160) 
/dev/dn8 0260 (176) 
/dev/dn9 0300 (192) 
/dev/dni0 0320 (208) 
/dev/dnii 0340 (224) 


7. Change the modes on the two devices to read/write by all. 
# chmod 666 /dev/cuaO /dev/cu10 


Note that the modes of the aliases of cu70 and cuaO are changed 
automatically because they are links. 


8. Ensure that an appropriate entry exists in the file /usr/lib/uucp/L- 
devices. 


ACU cul0 cua0 300 


If the high speed was chosen on the 212A data set, this line should 
have the speed 1200-baud instead of 300-baud. 


ia 9. After completing the above steps, make a new operating system, 
| install it as ‘unix and reboot the system. 
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The above are only examples and should not necessarily be copied directly 
for your system. 


4.2 Diagnosing Problems 


If the above steps are followed precisely and the unit still does not work, the 
hardware should be checked out. Problems should be diagnosed in the 
following order: 7 


1. Ensure that the lock file (/usr/spool/uucp/LCK...) is not present from 
earlier failed attempts with cu or uucp. 


2. Perform the self-tests on the data sets, the ACUs, and the sharing 
hardware (these tests are described in literature that comes with the 
devices). 


3. Verify that both the ACU and the data set are correctly optioned as 
described above. 


Check that the ACU interface (DN-11) is pulsing digits to the ACU. 
Determine that the ACU is dialing the correct number. 


Ensure the integrity of the data set by using it as a dial-up port. 


Oo: oS 


Determine that the cable leads are not defective. 
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( ~ § UNIX SYSTEM ACCOUNTING 


The UNIX system accounting provides methods to collect per-process 
resource utilization data, record connect sessions, monitor disk utilization, 
and charge fees to specific logins. A set of C language programs and shell 
procedures is provided to reduce this accounting data into summary files 
and reports. This chapter describes the structure, implementation, and 
management of this accounting system, as well as a discussion of the 
reports generated and the meaning of the columnar data. 


Throughout this chapter, each reference of the form name(1M), name(7), or 
name(8) refers to entries in the Sys5 UNIX Administrator Reference 
Manual. References to entries of the form name(N), where "N" is the 
number 1 or 6 possibly followed by a letter, refer to entry name in section N 
of the Sys5 UNIX User Reference Manual. If '"N" is a number (2 through 5) 
possibly followed by a letter, refer to entry name in section N of the Sys5 
UNIX Programmer Reference Manual. 


The following list is a synopsis of the actions of the accounting system: 


e At process termination, the UNIX system kernel writes one record per 
process in /usr/adm/pacct in the form of acct.h. 


p e The login and init programs record connect sessions by writing 
( i records into /etc/wtmp. Date changes, reboots, and shutdowns (via 
acctwtmp) are also recorded in this file. 


e The disk utilization program acctdusg and diskusg break down disk 
usage by login. 


e Fees for file restores, etc., can be charged to specific logins with the 
chargefee shell procedure. 


e Each day the runacct shell procedure is executed via cron to reduce 
accounting data and produce summary files and reports. 


e The monacct procedure can be executed on a monthly or fiscal 
period basis. It saves and restarts summary files, generates a report, 
and cleans up the sum directory. These saved summary files could 
be used to charge users for UNIX system usage. 


5.1 Files and Directories 


The /usr/lib/acct directory contains all of the C language programs and shell 
procedures necessary to run the accounting system. The adm login 
(currently user ID of 4) is used by the accounting system and has the login 
directory structure shown in Figure 5-1. 
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/usr/adm 


nite sum fiscal 


Figure 5-1. Directory Structure of the “adm” Login 


The /usr/adm directory contains the active data collection files. (For a 
complete explanation of the files used by the accounting system, see Figure 
5-2 at the end of this section.) The nite directory contains files that are 
reused daily by the runacct procedure. The sum directory contains the 
cumulative summary files updated by runacct. The fiscal directory contains 
periodic summary files created by monacct. 


5.2 Daily Operation 


When the UNIX system is switched into multiuser mode, /usriliblacctistartup 
is executed which does the following: 


1. The acctwtmp program adds a “boot” record to /efc/wtmp. This 
record is signified by using the system name as the login name in the 
wimp record. 


2. Process accounting is started via turnacct. Turnacct on executes the 
accton program with the argument /usr/adm/pacct. 


3. The remove shell procedure is executed to clean up the saved pacct 
and wtmp files left in the sum directory by runacct. 


The ckpacct procedure is run via cron every hour of the day to check the 
size of ‘usr/adm/pacct. lf the file grows past 1000 blocks (default), turnacct 
switch is executed. The advantage of having several smaller pacct files 
becomes apparent when trying to restart runacct after a failure processing 
these records. 


The chargefee program can be used to bill users for file restores, etc. It 
adds records to /usr/adm/fee which are picked up and processed by the 
next execution of runacct and merged into the total accounting records. 


Runacct is executed via cron each night. It processes the active 
accounting files, /usr/adm/pacct, /etc/wtmp, /usr/adm/acctinite/disktacct, 
and /usr/adm/fee. \|t produces command summaries and usage summaries 
by login. 

When the system is shut down using shutdown, the shutacct shell 
procedure is executed. It writes a shutdown reason record into /etc/witmp 
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and turns process accounting off. 


After the first reboot each morning, the computer operator should execute 
/usrilib/acct/prdaily to print the previous day’s accounting report. 


5.3 Setting Up the Accoutnting System 


In order to automate the operation of this accounting system, several things 
need to be done: 


1. If not already present, add this line to the /efc/rc file in the state 2 
section: 


/bin/su — adm —c /usr/lib/acct/startup 


2. If not already present, add this line to /etc/shutdown to turn off the 
accounting before the system is brought down: 


/usr/lib/acct/shutacct 


3. For most installations, the following three entries should be made in 
/usr/spool/cron/crontab/adm so that cron will automatically run the 
daily accounting. 


0 4 * * 1-6 /usr/lib/acct/runacct 2 > /usr/adm/acctinite/fd2log 
OQ 2 « « 4 /usr/lib/acct/dodisk 
5 *« « * * /usr/lib/acct/ckpacct 


4. To facilitate monthly merging of accounting data, the following entry in 
/usr/spool/cron/crontab/adm will allow monacct to clean up all daily 
reports and daily total accounting files and deposit one monthly total 
report and one monthly total accounting file in the fisca/ directory. 


1551 * * /usr/lib/acct/monacct 


The above entry takes advantage of the default action of monacct that 
uses the current month's date as the suffix for the file names. Notice 
that the entry is executed at such a time as to allow runacct sufficient 
time to complete. This will, on the first day of each month, create 
monthly accounting files with the entire month's data. 


5. The PATH shell variable should be set in /usr/adm/. profile to: 
PATH =/usr/lib/acct:/bin:/usr/bin 
5.4 Runacct 


Runacct is the main daily accounting shell procedure. It is normally initiated 
via cron during nonprime time hours. Runacct processes connect, fee, 
disk, and process accounting files. It also prepares daily and cumulative 
summary files for use by prdaily or for billing purposes. The following files 
produced by runacct are of particular interest: 
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nite/lineuse Produced by acctcon, reads the wtmp file, and 
produces usage Statistics for each terminal line on the 
system. This report is especially useful for detecting 
bad lines. If the ratio between the number of logoffs 
to logins exceeds about 3/1, there is a good 
possibility that the line is failing. 


nite/daytacct This file is the total accounting file for the previous 
day in tacct.h format. 


sum/tacct This file is the accumulation of each day’s 
nite/daytacct and can be used for billing purposes. It 
is restarted each month or fiscal period by the 
monacct procedure. 


sum/daycms Produced by the acctcms program. It contains the 
daily command summary. The ASCII version of this 
file is nite/daycms. 


sum/cms The accumulation of each day's command 
summaries. It is restarted by the execution of 
monacct. The ASCII version is nite/cms. 


sum/loginlog Produced by the lastlogin shell procedure. It 
maintains a record of the last time each login was 
used. 

sum/rprtMMDD Each execution of runacct saves a copy of the daily 


report that can be printed by prdaily. 


Runacct takes care not to damage files in the event of errors. A series of 
protection mechanisms are used that attempt to recognize an error, provide 
intelligent diagnostics, and terminate processing in such a way that runacct 
can be restarted with minimal intervention. It records its progress by writing 
descriptive messages into the file active. (Files used by runacct are 
assumed to be in the nite directory unless otherwise noted.) All diagnostics 
output during the execution of runacct is written into fd2/iog. Runacct will 
complain if the files fock and lock? exist when invoked. The /astdate file 
contains the month and day runacct was last invoked and is used to 
prevent more than one execution per day. If runacct detects an error, a 
message is written to /dev/console, mail is sent to root and adm, locks are 
removed, diagnostic files are saved, and execution is terminated. 


In order to allow runacct to be restartable, processing is broken down into 
separate reentrant states. A file is used to remember the last state 
completed. When each state completes, statefile is updated to reflect the 
next state. After processing for the state is complete, statefile is read and 
the next state is processed. When runacct reaches the CLEANUP state, it 
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( - removes the locks and terminates. States are executed as follows: 


SETUP 
WTMPFIX 


CONNECT1 


CONNECT2 


( PROCESS 


MERGE 
FEES 
DISK 


MERGETACCT 
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The command turnacct switch is executed. The 
process accounting files, /usr/adm/pacct?, are moved 
to /usriadm/Spacct?.MMDD. The /etc/wtmp file is 
moved to /usr/adm/acctinite/jwtmp.MMDD with the 
current time added on the end. 


The wimp file in the nite directory is checked for 
correctness by the wtmpfix program. Some date 
changes will cause acctcon1 to fail, so wtmpfix 
attempts to adjust the time stamps in the wtmp file if 
a date change record appears. 


Connect session records are written to ctmp in the 
form of ctmp.h. The /ineuse file is created, and the 
reboots file is created showing all of the boot records 
found in the wimp file. 


Cimp is converted to ctacctMMDD which are 
connect accounting records. (Accounting records are 
in tacct.h format.) 


The acctprc1 and acctprc2 programs are used to 
convert the process accounting files, 
/usriadm/Spacct?.MMDD, into’ total accounting 
records in ptacct?.MMDD. The Spacct and ptacct 
files are correlated by number so that if runacct fails 
the unnecessary reprocessing of Spacct files will not 
occur. One precaution should be noted; when 
restarting runacct in this state, remove the last 
ptacct file because it will not be complete. 


Merge the process accounting records with the 
connect accounting records to form daytacct. 


Merge in any ASCII tacct records from the file fee 
into daytacct. 


On the day after the dodisk procedure runs, merge 
disktacct with daytacct. 


Merge daytacct with sum/tacct, the cumulative total 
accounting file. Each day, daytacct is saved in 
sum/tacctMMDD, so that sum/tacct can be recreated 


in the event it becomes corrupted or lost. 


Merge in today's command summary with the 
cumulative command summary file sum/cms. 
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Produce ASCII and internal format command | 
summary files. i 


USEREXIT Any _ installation dependent (local) accounting 
programs can be included here. 


CLEANUP Clean up temporary files, run prdaily and save its 
Output in sum/rortMMDD, remove the locks, then exit. 


5.5 Recovering From Failure 


The runacct procedure can fail for a variety of reasons; usually due to a 
system crash, /usr running out of space, or a corrupted wtmp file. If the 
activeMMDD file exists, check it first for error messages. If the active file 
and lock files exist, check fd2/og for any mysterious messages. The 
following are error messages produced by runacct and the recommended 
recovery actions: | 


ERROR: locks found, run aborted 


The files flock and lock? were found. These files must be 
removed before runacct can restart. 


ERROR: acctg already run for date : check /usr/adm/acct/nite/lastdate 


The date in lastdate and today’s date are the same. Remove : 
lastdate. | ee 


ERROR: turnacct switch returned rc=? 


Check the integrity of turnacct and accton. The accton 
program must be owned by root and have the setuid bit set. 


ERROR: Spacct?. MMDD already exists 


File setups probably already run. 
Check status of files, then run setups manually. 


ERROR: /usr/adm/acct/nite/wtmp.MMDD already exists, run setup manually 
Self-explanatory. | 
ERROR: wtmpfix errors see /usr/adm/acct/nite/wtmperror 


Wtmpfix detected a corrupted wtmp file. Use fwtmp to 
correct the corrupted file. 


ERROR: connect acctg failed: check /usr/adm/acct/nite/log 


The acctcon1 program encountered a bad wtmp file. Use 
fwtmp to correct the bad file. 


ERROR: Invalid state, check /usr/adm/acct/nite/active \ 
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The file statefile is probably corrupted. Check 
Statefile and read active before restarting. 


5.6 Restarting Runacct 


Runacct called without arguments assumes that this is the first invocation of 
the day. The argument MMDD is necessary if runacct is being restarted 
and specifies the month and day for which runacct will rerun the 
accounting. The entry point for processing is based on the contents of 
Statefile. To override statefile, include the desired state on the command 
line. For example: 


To start runacct: 

nohup runacct 2> /usr/adm/acct/nite/fd2log& 

To restart runacct: 

nohup runacct 0601 2> /usr/adm/acct/nite/fd2log& 
To restart runacct at a specific state: 


nohup runacct 0601 WTMPFIX 2> /usr/adm/acctinite/fd2log& 
5.7 Fixing Corrupted Files 


Unfortunately, this accounting system is not entirely foolproof. Occasionally, 
a file will become corrupted or lost. Some of the files can simply be ignored 
or restored from the file save backup. However, certain files must be fixed 
in order to maintain the integrity of the accounting system. 


5.7.1. Fixing WIMP Errors 


The wtmp files seem to cause the most problems in the day-to-day 
operation of the accounting system. When the date is changed and the 
UNIX system is in multiuser mode, a set of date change records is written 
into /etc/wtmp. The wtmpfix program is designed to adjust the time stamps 
in the wtmp records when a date change is encountered. However, some 
combinations of date changes and reboots will slip through wtmpfix and 
cause acctcont1 to fail. The following steps show how to patch up a wimp 
file. 
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cd /usr/adm/acct/nite 
fwtmp < wtmp.MMDD > xwtmp 
ed xwtmp 
delete corrupted records or 
delete all records from beginning up to the date chance 
fwtmp —ic < xwtmp > wtmp.MMDD 


If the wtmp file is beyond repair, create a null wimp file. This will prevent 
any charging of connect time. Acctprc1 will not be able to determine which 
login owned a particular process, but it will be charged to the login that is 
first in the password file for that user id. 


5.7.2 Fixing TACCT Errors 


If the installation is using the accounting system to charge users for system 
resources, the integrity of sum/tacct is quite important. Occasionally, 
mysterious tacct records will appear with negative numbers, duplicate user 
IDs, or a user ID of 65,535. First check sum/tacctprev with prtacct. If it 
looks all right, the latest sum/tacct. MMDD should be patched up, then 
sum/tacct recreated. A simple patchup procedure would be: 


cd /usr/adm/acct/sum 
acctmerg —v < tacct. MMDD > xtacct 
ed xtacct 

remove the bad records 

write duplicate uid records to another file 
acctmerg —i < xtacct > tacct. MMDD 
acctmerg tacctprev < tacct. MMDD > tacct 


Remember that the monacct procedure removes all the tacct. MMDD files; 
therefore, sum/tacct can be recreated by merging these files together. 


5.8 Updating Holidays 


The file /usr/lib/acct/holidays contains the prime/nonprime table for the 
accounting system. The table should be edited to reflect your location's 
holiday schedule for the year. The format is composed of three types of 
entries: 


1. Comment Lines: Comment lines may appear anywhere in the file as 
long as the first character in the line is an asterisk. 


2. Year Designation Line: This line should be the first data line 
(noncomment line) in the file and must appear only once. The line 
consists of three fields of four digits each (leading white space is 
ignored). For example, to specify the year as 1982, prime time at 9:00 
a.m., and nonprime time at 4:30 p.m., the following entry would be 
appropriate: 
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1982 0900 1630 


A special condition allowed for in the time field is that the time 2400 is 
automatically converted to 0000. 


3. Company Holidays Lines: These entries follow the year designation 
line and have the following general format: 


day-of-year Month Day Description of Holiday 


The day-of-year field is a number in the range of 1 through 366 
indicating the day for the corresponding holiday (leading white space is 
ignored). The other three fields are actually commentary and are not 
currently used by other programs. 


5.9 Daily Reports 


Runacct generates five basic reports upon each invocation. They cover the 
areas of connect accounting, usage by person on a daily basis, command 
usage reported by daily and monthly totals, and a report of the last time 
users were logged in. 


The following paragraphs describe the reports and the meanings of their 
tabulated data. 


5.9.1 Daily Report 


In the first part of the report, the from/to banner should alert the 
administrator to the period reported on. The times are the time the last 
accounting report was generated until the time the current accounting report 
was generated. It is followed by a log of system reboots, shutdowns, power 
fail recoveries, and any other record dumped into /etc/wtmp by the 
acctwtmp program [see acct(iM) in the Sys5 UNIX Administrator 
Reference Manual]. 


The second part of the report is a breakdown of line utilization. The TOTAL 
DURATION tells how long the system was in multiuser state (able to be 
accessed through the terminal lines). The columns are: 


LINE The terminal line or access port. 

MINUTES The total number of minutes that line was in use 
during the accounting period. 

PERCENT The total number of MINUTES the line was in use 
divided into the TOTAL DURATION. 

# SESS The number of times this port was accessed for a 


login(1) session. 


# ON This column does not have much meaning anymore. 
It used to give the number of times that the port was 
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used to log a user on; but since login(1) can no 
longer be executed explicitly to log in a new user, this 
column should be identical with SESS. 


# OFF This column reflects not just the number of times a 
user logged off but also any interrupts that occur on 
that line. Generally, interrupts occur on a port when 
the getty(1M) is first invoked when the system is 
brought to multiuser state. Where this column does 
come into play is when the # OFF exceeds the # ON 
by a large factor. This usually indicates that the 
multiplexer, modem, or cable is going bad, or there is 
a bad connection somewhere. The most common 
cause of this is an unconnected cable dangling from 
the multiplexer. 


During real time, /etc/wtmp should be monitored as this is the file that the 
connect accounting is geared from. If it grows rapidly, execute acctcon1 to 
see which tty line is the noisest. If the interrupting is occurring at a furious 
rate, general system performance will be effected. | 


5.9.2 Daily Usage Report 


This report gives a by-user breakdown of system resource utilization. Its 
data consists of: 


UID The user ID. 


LOGIN NAME The login name of the user; there can be more 
than one login name for a single user ID, this 
identifies which one. 


CPU (MINS) This represents the amount of time the user’s 
process used the central processing unit. This 
category is broken down into PRIME and 
NPRIME (nonprime) utilization. The accounting 
system's idea of this breakdown is located in 
the /usr/lib/acct/holidays file. As _ delivered, 
prime time is defined to be 0900 through 1700 
hours. 


KCORE-MINS This represents a cumulative measure of the 
amount of memory a process uses_ while 
running. The amount shown reflects kilobyte 
segments of memory used per minute. This 
measurement is also broken down into PRIME 
and NPRIME amounts. 
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CONNECT (MINS) 


DISK BLOCKS 


# OF PROCS 


# OF SESS 


# DISK SAMPLES 


FEE 


This identifies “Real Time” used. What this 
column really identifies is the amount of time 
that a user was logged into the system. lf this 
time is rather high and the column “# OF 
PROCS” is low, this user is what is called a 
“line hog”. That is, this person logs in first thing 
in the morning and does not hardly touch the 
terminal the rest of the day. Watch out for 
these kinds of users. This column is also 
subdivided into PRIME and NPRIME utilization. 


When the disk accounting programs have been 
run, the output is merged into the total 
accounting record (tacct.h) and shows up in this 
column. This disk accounting is accomplished 
by the program acctdusg. 


This column reflects the number of processes 
that was invoked by the user. This is a good 
column to watch for large numbers indicating 
that a user may have a shell procedure that 
runs amock. 


This is how many times the user logged onto 
the system. 


This indicates how many times the disk 
accounting was run to obtain the average 
number of DISK BLOCKS listed earlier. 


An often unused field in the total accounting 
record, the FEE field represents the total 
accumulation of widgets charged against the 
user by the chargefee shell procedure [see 
acctsh(1M)]. The chargefee procedure is used 
to levy charges against a user for special 
services performed such as file restores, etc. 


5.9.3 Daily Command and Monthly Total Command Summaries 


These two reports are virtually the same except that the Daily Command 
Summary only reports on the current accounting period while the Monthly 
Total Command Summary tells the story for the start of the fiscal period to 
the current date. In other words, the monthly report reflects the data 
accumulated since the last invocation of monacct. 


The data included in these reports gives an administrator an idea as to the 
heaviest used commands and, based on those commands’ characteristics of 
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system resource utilization, a hint as to what to weigh more heavily when 


system tuning. 


These reports are sorted by TOTAL KCOREMIN, which is an arbitrary 
yardstick but often a good one for calculating "drain" on a system. 


COMMAND NAME 


NUMBER CMDS 


TOTAL KCOREMIN 
TOTAL CPU-MIN 
TOTAL REAL-MIN 
MEAN SIZE-K 


MEAN CPU-MIN 


HOG FACTOR 


This is the name of the command. 
Unfortunately, all shell procedures are lumped 
together under the name sh since only object 
modules are reported by the process accounting 
system. The administrator should monitor the 
frequency of programs called a.out or core or 
any other name that does not seem quite right. 
Often people like to work on their favorite 
version of backgammon only they do not want 
everyone to know about it. Acctcom is also a 
good tool to use for determining who executed a 
suspiciously named command and also if 
superuser privileges were used. 


This is the total number of invocations of this 


particular command. 


The total cumulative measurement of the 
amount of kilobyte segments of memory used 
by a process per minute of run time. 


The total processing time this program has 
accumulated. 


The total real-time (wall-clock) minutes this 
program has accumulated. This total is the 
actual “waited for’ time as opposed to kicking 
off a process in the background. 


This is the mean of the TOTAL KCOREMIN 
over the number of invocations reflected by 
NUMBER CMDS. 


This is the mean derived between the NUMBER 
CMDS and TOTAL CPU-MIN. 


This is a relative measurement of the ratio of 
system availability to system utilization. It ts 
computed by the formula 


(total CPU time) / (elapsed time) 


This gives a relative measure of the total 
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available CPU time consumed by the process 
during its execution. 


CHARS TRNSFD This column, which may go negative, is a total 
count of the number of characters pushed 
around by the read(2) and write(2) system 
Calls. 


BLOCKS READ A total count of the physical block reads and 
writes that a process performed. 


5.9.4 Last Login 


This report simply gives the date when a particular login was last used. This 
could be a good source for finding likely candidates for the archives or 
getting rid of unused logins and login directories. 


5.10 Summary 


The UNIX system accounting was designed from a UNIX system 
administrator's point of view. Every possible precaution has been taken to 
ensure that the system will run smoothly and without error. It is important to 
become familiar with the C programs and shell procedures. The manual 
pages should be studied, and it is advisable to keep a printed copy of the 
shell procedures handy. The accounting system should be easy to 
maintain, provide valuable information for the administrator, and provide 
accurate breakdowns of the usage of system resources for charging 


purposes. 
5.10.1 Files in the /usr/adm directory 


diskdiag diagnostic output during the execution of disk 
accounting programs 


dtmp output from the acctdusg program 

fee output from the chargefee program, ASCII tacct 
records 

pacct active process accounting file 

pacct? process accounting files switched via turnacct 

Spacct?. MMDD process accounting files for MMDD_ during 


execution of runacct 
5.10.2 Files in the /usr/adm/acct/nite directory 


active used by runacct to record progress and print 
warning and error messages. activeMMDD 
same as active after runacct detects an error 
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cms 
ctacct. MMDD 


ctmp 


daycms 


daytacct 


disktacct 


fd2log 


lastdate 


lock lock‘ 
lineuse 
log 
logMMDD 


reboots 


statefile 


tmpwtmp 


witmperror 


wtmperrorMMDD 


wtmp.MMDD 


cms 


cmsprev 


daycms 


loginlog 
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ASCII total command summary used by prdaily 
connect accounting records in tacct.h format 


output of acctcon? program, connect session 
records in ctmp.h format 


ASCII daily command summary used by prdaily 


total accounting records for 1 day in tacct.h 
format 


disk accounting records in tacct.h format, 
created by dodisk procedure 


diagnostic output during execution of runacct 
(see cron entry) 


last day runacct executed in date +%m%d 
format 


used to control serial use of runacct 

tty line usage report used by prdaily 
diagnostic output from acctcon? 

same as log after runacct detects an error 


contains beginning and ending dates from wtmp, 
and a listing of reboots 


used to record current state during execution of 
runacct 


wtmp file corrected by witmpfix 
place for wtmpfix error messages 


same as wtmperror after runacct detects an 
error | 


previous day's wtmp file 


5.10.3 Files in the /usr/adm/acct/sum directory 


total command summary file for current fiscal in 
internal summary format 


command summary file without latest update 


command summary file for yesterday in internal 
summary format 


created by /astlogin 
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pacct. MMDD 


rprtMMDD 
tacct 
tacctprev 
tacctMMDD 
wtmp.MMDD 


concatenated version of all pacct files for 
MMDD, removed after reboot by remove 
procedure 


saved output of prdaily program 

cumulative total accounting file for current fiscal 
same as tacct without latest update 

total accounting file for MMDD 


saved copy of wtmp file for MMDD, removed 
after reboot by remove procedure 


5.10.4 Files in the /usr/adm/acct/fiscal directory 


cms? 


fiscrpt? 


tacct? 


Sys5 UNIX 


total command summary file for fiscal ? in 
internal summary format 


report similar to prdaily for fiscal ? 


total accounting file for fiscal ? 
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6. FILE SYSTEM CHECKING 


The File System Check Program (fsck) is an interactive file system check 
and repair program. Fsck uses the redundant structural information in the 
UNIX system file system to perform several consistency checks. If an 
inconsistency is detected, it is reported to the operator, who may elect to fix 
or ignore each inconsistency. These inconsistencies result from the 
permanent interruption of the file system updates, which are performed 
every time a file is modified. Fsck is frequently able to repair corrupted file 
systems using procedures based upon the order in which the UNIX system 
honors these file system update requests. 


The purpose of this chapter is to describe the normal updating of the file 
system, to discuss the possible causes of file system corruption, and to 
present the corrective actions implemented by fsck. Both the program and 
the interaction between the program and the operator are described. 


Appendix 6-1 contains the fsck error conditions. The meanings of the 
various error conditions, possible responses, and related error conditions are 
explained. 


When a UNIX operating system is brought up, a consistency check of the 
file systems should always be performed. This precautionary measure helps 
to ensure a reliable environment for file storage on disk. If an inconsistency 
is discovered, corrective action must be taken. 


The updating of the file system and file system corruption is described in this 
chapter. Finally, the set of heuristically sound corrective actions used by 
fsck are presented. 


6.0.1 System Administrator Advice 


Remember that system buffers are 1024 bytes. When configuring the 
operating system, take into consideration that the same number of buffers 
as before will use more main memory. Weigh this against reducing the 
number of buffers, which reduces the cache hit ratio and degrades 
performance. 


6.1 Update of the File System 


Every working day hundreds of files are created, modified, and removed. 
Every time a file is modified, the UNIX operating system performs a series 
of file system updates. These updates, when written on disk, yield a 
consistent file system. To understand what happens in the event of a 
permanent interruption in this sequence, it is important to understand the 
order in which the update requests were probably being honored. Knowing 
which pieces of information were probably written to the file system first, 
heuristic procedures can be developed to repair a corrupted file system. 
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There are five types of file system updates. These involve the superblock, 


inodes, indirect blocks, data blocks (directories and files), and free-list 
blocks. 


6.1.1 Superblock 


The superblock contains information about the size of the file system, the 
size of the inode list, part of the free-block list, the count of free blocks, the 
count of free inodes, and part of the free-inode list. _ 


The superblock of a mounted file system (the root file system is always 
mounted) is written to the file system whenever the file system is 
unmounted or a sync command is issued. 


6.1.2 Inodes 


An inode contains information about the type of inode (directory, data, or 
special), the number of directory entries linked to the inode, the list of blocks 
claimed by the inode, and the size of the inode. 


An inode is written to the file system upon closure of the file associated with 
the inode. (All “in” core blocks are also written to the file system upon issue 
of a sync system call.) 


6.1.3 Indirect Blocks 


There are three types of indirect blocks—single-indirect, double-indirect, and 
triple-indirect. A single-indirect block contains a list of some of the block 
numbers claimed by an inode. Each one of the 128 entries in an indirect 
block is a data-block number. A double-indirect block contains a list of 
single-indirect block numbers. A triple-indirect block contains a list of 
double-indirect block numbers. 


Indirect blocks are written to the file system whenever they have been 
modified and released by the operating system. More precisely, they are 
queued for eventual writing. Physical I/O is deferred until the buffer is 
needed by the UNIX system or a syne command is issued. 


6.1.4 Data Blocks 


A data block may contain file information or directory entries. Each directory 
entry consists of a file name and an inode number. 


Data blocks are written to the file system whenever they have been modified 
and released by the operating system. 


6.1.5 First Free-List Block 


The superblock contains the first free-list block. The free-list blocks are a 
list of all blocks that are not allocated to the superblock, inodes, indirect 
blocks, or data blocks. Each free-list block contains a count of the number 
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of entries in this free-list block, a pointer to the next free-list block, and a 
partial list of free blocks in the file system. 


Free-list blocks are written to the file system whenever they have been 
modified and released by the operating system. 


6.2 Corruption of the File System 


A file system can become corrupted in a variety of ways. Improper 
shutdown procedures and hardware failures are the most common. 


6.2.1 Improper System Shutdown and Startup 


File systems may become corrupted when proper shutdown procedures are 
not observed, e.g., forgetting to sync the system prior to halting the CPU, 
physically write-protecting a mounted file system, or taking a mounted file 
system off-line. 


File systems may also become further corrupted by allowing a corrupted file 
system to be used (and, thus, to be modified further) can be disastrous. 


6.2.2 Hardware Failure 


Any piece of hardware can fail at any time. Failures can be as subtle as a 
bad block on a disk platter or as blatant as a nonfunctional disk controller. 


6.3 Detection and Correction of Corruption 


A quiescent file system (an unmounted system and not being written on) 
may be checked for structural integrity by performing consistency checks on 
the redundant data intrinsic to a file system. The redundant data is either 
read from the file system or computed from other known values. A 
quiescent state is important during the checking of a file system because of 
the multipass nature of the fsck program. 


When an inconsistency is discovered, fsck reports the inconsistency for the 
operator to chose a corrective action. 


Discussed in this part are how to discover inconsistencies (and possible 
corrective actions) for the superblock, the inodes, the indirect blocks, the 
data blocks containing directory entries, and the free-list blocks. These 
corrective actions can be performed interactively by the fsck command 
under control of the operator. 


6.3.1 Superblock 


One of the most common corrupted items is the superblock. The 
superblock is prone to corruption because every change to the file system’s 
blocks or inodes modifies the superblock. 


The superblock and its associated parts are most often corrupted when the 
computer is halted and the last command involving output to the file system 
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was not a sync command. 


The superblock can be checked for inconsistencies involving file system 
size, inode-list size, free-block list, free-block count, and the free-inode 
count. 


6.3.1.1 File System Size and Inode-List Size 


The file system size must be larger than the number of blocks used by the 
superblock and the number of blocks used by the list of inodes. The 
number of inodes must be less than 65,535. The file system size and 
inode-list size are critical pieces of information to the fsck program. While 
there is no way to actually check these sizes, fsck can check for them being 
within reasonable bounds. All other checks of the file system depend on the 
correctness of these sizes. 


6.3.1.2 Free-Block List 


The free-block list starts in the superblock and continues through the free-list 
blocks of the file system. Each free-list block can be checked for a list 
count out of range, for block numbers out of range, and for blocks already 
allocated within the file system. A check is made to see that all the blocks 
in the file system were found. 


The first free-block list is in the superblock. Fsck checks the list count for a 
value of less than 0 or greater than 50. It also checks each block number 
for a value of less than the first data block in the file system or greater than 
the last block in the file system. Then it compares each block number to a 
list of already allocated blocks. If the free-list block pointer is nonzero, the 
next free-list block is read in and the process is repeated. 


When all the blocks have been accounted for, a check is made to see if the 
number of blocks used by the free-block list plus the number of blocks 
claimed by the inodes equals the total number of blocks in the file system. 


If anything is wrong with the free-block list, then fsck may rebuild the list, 
excluding all blocks in the list of allocated blocks. 


6.3.1.3 Free-Block Count 


The superblock contains a count of the total number of free blocks within the 
file system. Fsck compares this count to the number of blocks it found free 
within the file system. If the counts do not agree, then fsck may replace the 
count in the superblock by the actual free-block count. 


6.3.1.4 Free-Inode Count 


The superblock contains a count of the total number of free inodes within 
the file system. Fsck compares this count to the number of inodes it found 
free within the file system. If the counts do not agree, then fsck may 
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replace the count in the superblock by the actual free-inode count. 
6.3.2 Inodes 


An individual inode is not as likely to be corrupted as the superblock. 
However, because of the great number of active inodes, there is almost as 
likely a chance for corruption in the inode list as in the superblock. 


The list of inodes is checked sequentially starting with inode 1 (there is no 
inode 0) and going to the last inode in the file system. Each inode can be 
checked for inconsistencies involving format and type, link count, duplicate 
blocks, bad blocks, and inode size. 


6.3.2.1 Format and Type 


Each inode contains a mode word. This mode word describes the type and 
state of the inode. Inodes may be one of four types: 


e Regular 

e Directory 

e Special block 

® Special character. 


If an inode is not one of these types, then the inode has an illegal type. 
Inodes may be found in one of three states—unallocated, allocated, and 
neither unallocated nor allocated. This last state indicates an incorrectly 
formatted inode. An inode can get in this state if bad data is written into the 
inode list through, for example, a hardware failure. The only possible 
corrective action ts for fsck to clear the inode. 


6.3.2.2 Link Count 


Contained in each inode is a count of the total number of directory entries 
linked to the inode. Fsck verifies the link count of each inode by traversing 
down the total directory structure, starting from the root directory, and 
Calculating an actual link count for each inode. 


If the stored link count is nonzero and the actual link count is zero, it means 
that no directory entry appears for the inode. If the stored and actual link 
counts are nonzero and unequal, a directory entry may have been added or 
removed without the inode being updated. 


lf the stored link count is nonzero and the actual link count is zero, fsck can, 
under operator control, link the disconnected file to the /ost+ found directory. 
lf the stored and actual link counts are nonzero and unequal, fsck can 
replace the stored link count by the actual link count. 


sysV UNIX 6-5 


CHAPTER 6 FILE SYSTEM CHECKING 


6.3.2.3 Duplicate Blocks 


Contained in each inode is a list or pointers to lists (indirect blocks) of all the 
blocks claimed by the inode. Fsck compares each block number claimed by 
an inode to a list of already allocated blocks. If a block number is already 
claimed by another inode, the block number is added to a list of duplicate 
blocks. Otherwise, the list of allocated blocks is updated to include the 
block number. If there are any duplicate blocks, fsck will make a partial 
second pass of the inode list to find the inode of the duplicated block. This 
is necessary because without examining the files associated with these 
inodes for correct content there is not enough information available to 
decide which inode is corrupted and should be cleared. Most of the time, 
the inode with the earliest modify time is incorrect and should be cleared. 
This condition can occur by using a file system with blocks claimed by both 
the free-block list and by other parts of the file system. 


A large number of duplicate blocks in an inode may be due to an indirect 
block not being written to the file system. Fsck will prompt the operator to 
clear both inodes. | 


6.3.2.4 Bad Blocks 


Contained in each inode is a list or pointer to lists of all the blocks claimed 
by the inode. Fsck checks each block number claimed by an inode for a 
value lower than that of the first data block or greater than the last block in 
the file system. If the block number is outside this range, the block number 
is a bad block number. 3 


If there is a large number of bad blocks in an inode, this may be due to an 
indirect block not being written to the file system. Fsck will prompt the 
operator to clear both inodes. 


6.3.2.5 Size Checks 


Each inode contains a 32-bit (4-byte) size field. This size indicates the 
number of characters in the file associated with the inode. This size can be 
checked for inconsistencies, e.g., directory sizes that are not a multiple of 
16 characters or the number of blocks actually used not matching that 
indicated by the inode size. 


A directory inode within the file system has the directory bit on in the inode 
mode word. The directory size must be a multiple of 16 because a 
directory entry contains 16 bytes (2 bytes for the inode number and 14 bytes 
for the file or directory name). 


Fsck will warn of such directory misalignment. This is only a warning 
because not enough information can be gathered to correct the 
misalignment. 
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A rough check of the consistency of the size field of an inode can be 
performed by computing from the size field the number of blocks that should 
be associated with the inode and comparing it to the actual number of 
blocks claimed by the inode. 


Fsck calculates the number of blocks that there should be in an inode by 
dividing the number of characters in an inode by the number of characters 
per block and rounding up. Fsck adds one block for each indirect block 
associated with the inode. If the actual number of blocks does not match 
the computed number of blocks, fsck wili warn of a possible file-size error. 
This is only a warning because the UNIX system does not fill in blocks in 
files created in random order. 


6.3.3 Indirect Blocks 


Indirect blocks are owned by an inode. Therefore, inconsistencies in indirect 
blocks directly affect the inode that owns tt. 


Inconsistencies that can be checked are blocks already claimed by another 
inode and block numbers outside the range of the file system. 


For a discussion of detection and correction of the inconsistencies 
associated with indirect blocks, see parts “Duplicate Blocks” and “Bad 
Blocks”. 


6.3.4 Data Blocks 


The two types of data blocks are plain data blocks and directory data blocks. 
Plain data blocks contain the information stored in a file. Directory data 
blocks contain directory entries. Fsck does not attempt to check the validity 
of the contents of a plain data block. 


Each directory data block can be checked for inconsistencies involving 
directory inode numbers pointing to unallocated inodes, directory inode 
numbers greater than the number of inodes in the file system, incorrect 
directory inode numbers for “.” and “..", and directories disconnected from 
the file system. In addition, the validity of the contents of a directory's data 
block is checked. 


lf a directory entry inode number points to an unallocated inode, then fsck 
may remove that directory entry. This condition probably occurred because 
the data blocks containing the directory entries were modified and written 
out while the inode was not yet written out. 


If a directory entry inode number is pointing beyond the end of the inode list, 
fsck may remove that directory entry. This condition occurs if bad data is 
written into a directory data biock. 


The directory inode number entry for “.” should be the first entry in the 
directory data block. Its value should be equal to the inode number for the 
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directory data block. 


The directory inode number entry for “..” should be the second entry in the 
directory data block. Its value should be equal to the inode number for the 
parent of the directory entry (or the inode number of the directory data block 
if the directory is the root directory). 


If the directory inode numbers are incorrect, fsck may replace them with the 
correct values. 


Fsck checks the general connectivity of the file system. If directories are 
found not to be linked into the file system, fsck will link the directory back 
into the file system in the lost+found directory. This condition can be 
caused by inodes being written to the file system with the corresponding 
directory data blocks not being written to the file system. 


6.3.5 Free-List Blocks 


Free-list blocks are owned by the superblock. Therefore, inconsistencies in 
free-list blocks directly affect the superblock. 


Inconsistencies that can be checked are a list count outside of range, block 
numbers outside of range, and blocks already associated with the file 
system. 


For a discussion of detection and correction of the inconsistencies 
associated with free-list blocks, see part “Free-Block List”. 
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6.4 Appendix 6-1 (FSCK Error Conditions) 
6.4.1 Conventions 


Fsck is a multipass file system check program. Each file system pass 
invokes a different phase of the fsck program. After the initial setup, fsck 
performs successive phases over each file system performing cleanup, 
checking blocks and sizes, pathnames, connectivity, reference counts, and 
the free-block list (possibly rebuilding it). 


When an inconsistency is detected, fsck reports the error condition to the 
operator. If a response is required, fsck prints a prompt message and waits 
for a response. This appendix explains the meaning of each error condition, 
the possible responses, and the related error conditions. 


The error conditions are organized by the “Phase” of the fsck program in 
which they can occur. The error conditions that may occur in more than one 
phase will be discussed under Part B. 


6.4.2 Initialization 


Before a file system check can be performed, certain tables have to be set 
up and certain files opened. This section describes the opening of files and 
the initialization of tables. Error conditions resulting from command line 
options, memory requests, opening of files, status of files, file system size 
checks, and creation of the scratch file are listed below. 


6.4.2.1 C option? 


C is not a legal option to fsck; legal options are -y, —n, —s, —S, -t, -r, -q, 
and -—D. Fsck terminates on this error condition. See the fsck(1M) entry in 
the UNIX System V Administrator Reference Manual for further details. 


6.4.2.2 Bad —t option 


The -t option is not followed by a file name. Fsck terminates on this error 
condition. See the fsck(1M) entry in the UNIX System V Administrator 
Reference Manual for further details. 


6.4.2.3 Invalid —s argument, defaults assumed 


The -s option is not suffixed by 3, 4, or blocks-per-cylinder:blocks-to-skip. 
Fsck assumes a default value of 400 blocks-per-cylinder and 9 blocks-to- 
skip. See the fsck(1M) entry in the UNIX System V Administrator 
Reference Manual for further details. 


6.4.2.4 Incompatible options: —n and —s 


It is not possible to salvage the free-block list without modifying the file 
system. Fsck terminates on this error condition. See the fsck(1M) entry in 
the UNIX System V Administrator Reference Manual for further details. 
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6.4.2.5 Can not fstat standard input 


Fsck’'s attempt to fstat standard input failed. The occurrence of this error 
condition indicates a serious problem which may require additional 
assistance. Fsck terminates on this error condition. 


6.4.2.6 Can not get memory 


Fsck’s request for memory for its virtual memory tables failed. The 
occurrence of this error condition indicates a serious problem which may 
require additional assistance. Fsck terminates on this error condition. 


6.4.2.7 Can not open checkall file: F 


The default file system checkall file F (usually /etc/checkalf) cannot be 
opened for reading. Fsck terminates on this error condition. Check access 
modes of F. 


6.4.2.8 Can not stat root 


Fsck’s request for statistics about the root directory “/” failed. The 
occurrence of this error condition indicates a serious problem which may 
require additional assistance. Fsck terminates on this error condition. 


6.4.2.9 Can not stat F 


Fsck’s request for statistics about the file system F failed. It ignores this file 
system and continues checking the next file system given. Check access 
modes of F. 
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Fsck has been given a regular file name by mistake. It ignores this file 
system and continues checking the next file system given. Check file type 
of F. 


6.4.2.11 Can not open F 
The file system F cannot be opened for reading. It ignores this file system 


and continues checking the next file system given. Check access modes of. 


F. 
6.4.2.12 Size check: fsize X isize Y 


More blocks are used for the inode list Y than there are blocks in the file 
system X, or there are more than 65,535 inodes in the file system. It 
ignores this file system and continues checking the next file system given. 


6.4.2.13 Can not create F 


Fsck’s request to create a scratch file F failed. It ignores this file system 
and continues checking the next file system given. Check access modes of 
F. 
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6.4.2.14 CAN NOT SEEK: BLK B (CONTINUE) 


Fsck’s request for moving to a specified block number 8 in the file system 
failed. The occurrence of this error condition indicates a serious problem 
which may require additional assistance. 


Possible responses to CONTINUE prompt are: 


YES Attempt to continue to run file system check. Often, 
however, the problem will persist. This error condition 
will not allow a complete check of the file system. A 
second run of fsck should be made to recheck this 
file system. If block was part of the virtual memory 
buffer cache, fsck will terminate with the message 
“Fatal |/O error’. 


NO Terminate program. 
6.4.2.15 CAN NOT READ: BLK B (CONTINUE) 


Fsck’'s request for reading a specified block number B in the file system 
failed. The occurrence of this error condition indicates a serious problem 
which may require additional assistance. 


Possible responses to CONTINUE prompt are: 


YES Attempt to continue to run file system check. Often, 
however, the problem will persist. This error condition 
will not allow a complete check of the file system. A 
second run of fsck should be made to recheck this 
file system. If block was part of the virtual memory 
buffer cache, fsck will terminate with the message 
“Fatal I/O error”. 


NO Terminate program. 
6.4.2.16 CAN NOT WRITE: BLK B (CONTINUE) 


Fsck’'s request for writing a specified block number 8 in the file system 
failed. The disk is write-protected. 


Possible responses to CONTINUE prompt are: 


YES Attempt to continue to run file system check. Often, 
however, the problem will persist.. This error condition 
will not allow a complete check of the file system. A 


second run of fsck should be made to recheck this — 


file system. If block was part of the virtual memory 
buffer cache, fsck will terminate with the message 
“Fatal I/O error”. 
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NO Terminate program. 
6.4.3 PHASE 1: CHECK BLOCKS AND SIZES 


This phase concerns itself with the inode list. This part lists error conditions 
resulting from checking inode types, setting up the zero-link-count table, 
examining inode block numbers for bad or duplicate blocks, checking inode 
size, and checking inode format. 


6.4.3.1 UNKNOWN FILE TYPE I=1| (CLEAR) 


The mode word of the inode / indicates that: the inode is not a special 
character inode, regular inode, or directory inode. 


Possible responses to CLEAR prompt are: 


YES Deallocate inode / by zeroing its contents. This will 
always invoke the UNALLOCATED error condition in 
Phase 2 for each directory entry pointing to this 
inode. 


NO Ignore this error condition. 
6.4.3.2 LINK COUNT TABLE OVERFLOW (CONTINUE) 


An internal table for fsck containing allocated inodes with a link count of 
zero has no more room. Recompile fsck with a larger value of MAXLNCNT. 


Possible responses to CONTINUE prompt are: 


YES Continue with program. This error condition will not 
allow a complete check of the file system. A second 
run of fsck should be made to recheck this file 
system. If another allocated inode with a zero link 
count is found, this error condition is repeated. 


NO Terminate program. 
6.4.3.3 B BAD I=! OO 


Inode / contains block number B with a number lower than the number of 
the first data block in the file system or greater than the number of the last 
block in the file system. This error condition may invoke the EXCESSIVE 
BAD BLKS error condition in Phase 1 if inode / has too many block numbers 
outside the file system range. This error condition will always invoke the 
BAD/DUP error condition in Phase 2 and Phase 4. 


6.4.3.4 EXCESSIVE BAD BLKS I=i (CONTINUE) 


There is more than a tolerable number (usually 10) of blocks with a number. 
lower than the number of the first data block in the file system or greater 
than the number of the last block in the file system associated with inode /. 
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Possible responses to CONTINUE prompt are: 


YES Ignore the rest of the blocks in this inode and 
continue checking with next inode in the file system. 
This error condition will not allow a complete check of 
the file system. A second run of fsck should be 
made to recheck this file system. 


NO Terminate program. 
6.4.3.5 B DUP I=! 


Inode / contains block number B which is already claimed by another inode. 
This error condition may invoke the EXCESSIVE DUP BLKS error condition 
in Phase 1 if inode / has too many block numbers claimed by other inodes. 
This error condition will always invoke Phase 1b and the BAD/DUP error 
condition in Phase 2 and Phase 4. 


6.4.3.6 EXCESSIVE DUP BLKS I=I (CONTINUE) 


There is more than a tolerable number (usually 10) of blocks claimed by 
other inodes. 


Possible responses to CONTINUE prompt are: 


YES Ignore the rest of the blocks in this inode and 
continue checking with next inode in the file system. 
This error condition will not allow a complete check of 
the file system. A second run of fsck should be 
made to recheck this file system. 


NO Terminate program. 
6.4.3.7 DUP TABLE OVERFLOW (CONTINUE) 


An internal table in fsck containing duplicate block numbers has no more 
room. Recompile fsck with a larger value of DUPTBLSIZE. 


Possible responses to CONTINUE prompt are: 


YES Continue with program. This error condition will not 
allow a complete check of the file system. A second 
run of fsck should be made to recheck this file 
system. If another duplicate block is found, this error 
condition will repeat. 


NO Terminate program. 
6.4.3.8 POSSIBLE FILE SIZE ERROR I=! 


The inode / size does not match the actual number of blocks used by the 
inode. This is only a warning. If the —q option is used, this message is not 
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printed. 
6.4.3.9 DIRECTORY MISALIGNED I=! 


The size of a directory inode is not a multiple of the size of a directory entry 
(usually 16). This is only a warning. If the —q option is used, this message 
is not printed. 


6.4.3.10 PARTIALLY ALLOCATED INODE I=! (CLEAR) 
Inode / is neither allocated nor unallocated. 
Possible responses to CLEAR prompt are: 
YES | Deallocate inode | by zeroing its contents. 
NO Ignore this error condition. 
6.4.4 PHASE 1B: RESCAN FOR MORE DUPS 


When a duplicate block is found in the file system, the file system is 
rescanned to find the inode which previously claimed that block. This part 
lists the error condition when the duplicate block is found. 


6.4.4.1 B DUP I=! 


Inode / contains block number 8 which is already claimed by another inode. 
This error condition will always invoke the BAD/DUP error condition in 
Phase 2. Inodes with overlapping blocks may be determined by examining 
this error condition and the DUP error condition in Phase 1. 


6.4.5 PHASE 2: CHECK PATHNAMES 


This phase concerns itself with removing directory entries pointing to error 
conditioned inodes from Phase 1 and Phase 1b. This part lists error 
conditions resulting from root inode mode and status, directory inode 
pointers in range, and directory entries pointing to bad inodes. 


6.4.5.1 ROOT INODE UNALLOCATED. TERMINATING 


The root inode (always inode number 2) has no allocate mode bits. The 
occurrence of this error condition indicates a serious problem which may 
require additional assistance. The program will terminate. 


6.4.5.2 ROOT INODE NOT DIRECTORY (FIX) 
The root inode (usually inode number 2) is not directory inode type. 
Possible responses to FIX prompt are: 


YES Replace the root inode’s type to be a directory. If the 
root inode’s data blocks are not directory blocks, a 
very large number of error conditions will be 
produced. 
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NO Terminate program. 
6.4.5.3 DUPS/BAD IN ROOT INODE (CONTINUE) 


Phase 1 or Phase 1b have found duplicate blocks or bad blocks in the root 
inode (usually inode number 2) for the file system. 


Possible responses to CONTINUE prompt are: 


YES Ignore DUPS/BAD error condition in root inode and 
attempt to continue to run the file system check. If 
root inode is not correct, then this may result in a 
large number of other error conditions. 


NO Terminate program. 
6.4.5.4 | OUT OF RANGE I=! NAME =F (REMOVE) 


A directory entry F has an inode number / which is greater than the end of 
the inode list. 


Possible responses to REMOVE prompt are: 
YES The directory entry F is removed. 
NO Ignore this error condition. 


6.4.5.5 UNALLOCATED I=! OWNER =O MODE=M SIZE=S MTIME=T 
NAME =F (REMOVE) 


A directory entry F has an inode / without allocate mode bits. The owner O, 
mode M, size S, modify time 7, and file name F are printed. If the file 
system is not mounted and the —n option was not specified, the entry will be 
removed automatically if the inode it points to is character size QO. 


Possible responses to REMOVE prompt are: 


YES The directory entry F is removed. 
NO Ignore this error condition. 
6.4.5.6 DUP/BAD |=| OWNER=0 MODE=M SIZE=S MTIME=T DIR=F 
(REMOVE) 


Phase 1 or Phase 1b have found duplicate blocks or bad blocks associated 
with directory entry F, directory inode /. The owner O, mode M, size S, 
modify time 7, and directory name F are printed. 


Possible responses to REMOVE prompt are: 
YES The directory entry F is removed. 


NO Ignore this error condition. 
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6.4.5.7 DUP/BAD I=] OWNER=O MODE=mM SIZE=S MTIME=T FILE=F 
(REMOVE) 


Phase 1 or Phase 1b have found duplicate blocks or bad blocks associated 
with directory entry F, inode /. The owner O, mode M, size S, modify time 
T, and file name F are printed. 


Possible responses to REMOVE prompt are: 


YES The directory entry F is removed. 
NO Ignore this error condition. 
6.4.5.8 BAD BLK B IN DIR I=l OWNER=O MODE=M SIZE=S 
MTIME=T 


This message only occurs when the -—q option is used. A bad block was 
found in DIR inode /. Error conditions looked for in directory blocks are 
nonzero padded entries, inconsistent “.” and “..” entries, and imbedded 
slashes in the name field. This error message indicates that the user should 
at a later time either remove the directory inode if the entire block looks bad 
or change (or remove) those directory entries that look bad. 


6.4.6 PHASE 3: CHECK CONNECTIVITY 


This phase concerns itself with the directory connectivity seen in Phase 2. 
This part lists error conditions resulting from unreferenced directories and 
missing or full /ost+ found directories. 


6.4.6.1 UNREF DIR |=| OWNER=O MODE=M SIZE=S MTIME=T 
(RECONNECT) | 


The directory inode / was not connected to a directory entry when the file 
system was traversed. The owner O, mode M, size S, and modify time T of 
directory inode / are printed. Fsck will force the reconnection of a nonempty 
directory. | 


Possible responses to RECONNECT prompt are: 


YES Reconnect directory inode / to the file system in 
directory for lost files (usually jost+ found). This may 
invoke fost+found error condition in Phase 3 if there 
are problems connecting directory inode / to 
lost+found. This may also invoke CONNECTED 
error condition in Phase 3 if link was successful. 


NO Ignore this error condition. This will always invoke 
UNREF error condition in Phase 4. 
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6.4.6.2 SORRY. NO lost+ found DIRECTORY 


There is no /ost+found directory in the root directory of the file system; fsck 
ignores the request to link a directory in fjost+ found. This will always invoke 
the UNREF error condition in Phase 4. Check access modes of /ost+ found. 
See fsck(1M) in the UNIX System V Administrator Reference Manual for 
further details. 


6.4.6.3 SORRY. NO SPACE IN lost+ found DIRECTORY 


There is no space to add another entry to the /ost+ found directory in the 
root directory of the file system; fsck ignores the request to link a directory 
in fost+found. This will always invoke the UNREF error condition in Phase 
4. Clean out unnecessary entries in /ost+ found or make lost+ found larger. 
See fsck(1M) in the UNIX System V Administrator Reference Manual for 
further details. 


6.4.6.4 DIR 1=!1 CONNECTED. PARENT WAS [=12 


This is an advisory message indicating a directory inode !7 was successfully 
connected to the /ost+ found directory. The parent inode /2 of the directory 
inode /7 is replaced by the inode number of the /ost+found directory. 


6.4.7 PHASE 4: CHECK REFERENCE COUNTS 


This phase concerns itself with the link count information seen in Phase 2 
and Phase 3. This part lists error conditions resulting from unreferenced 
files; missing or full /ost+found directory; incorrect link counts for files, 
directories, or special files; unreferenced files and directories; bad and 
duplicate blocks in files and directories;.and incorrect total free-inode counts. 


6.4.7.1 UNREF FILE I=| OWNER=O MODE =M SIZE =S MTIME=T 
(RECONNECT) 


Inode / was not connected to a directory entry when the file system was 
traversed. The owner O, mode M, size S, and modify time T of inode / are 
printed. If the —n option is not set and the file system is not mounted, empty 
files will not be reconnected and will be cleared automatically. 


Possible responses to RECONNECT prompt are: 


YES Reconnect inode / to file system in the directory for 
| lost files (usually /ost+found). This may invoke 
lost+found error condition in Phase 4 if there are 

problems connecting inode / to lost+ found. 


NO Ignore this error condition. This will always invoke 
CLEAR error condition in Phase 4. 
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6.4.7.2 SORRY. NO lost+ found DIRECTORY 


There is no /ost+ found directory in the root directory of the file system; fsck 
ignores the request to link a file in lost+found. This will always invoke 
CLEAR error condition in Phase 4. Check access modes of /lost+ found. 


6.4.7.3 SORRY. NO SPACE IN lost+found DIRECTORY 


There is no space to add another entry to the /ost+ found directory in the 
root directory of the file system; fsck ignores the request to link a file in 
lost+ found. This will always invoke the CLEAR error condition in Phase 4. 
Check size and contents of lost+ found. 


6.4.7.4 (CLEAR) 


The inode mentioned in the immediately previous error condition cannot be 
reconnected. 


Possible responses to CLEAR prompt are: 


YES Deallocate inode mentioned in the immediately 
previous error condition by zeroing its contents. 


NO Ignore this error condition. 


6.4.7.5 LINK COUNT FILE |=| OWNER=O MODE=M SIZE=S MTIME=T 
COUNT=X SHOULD BE Y (ADJUST) 


The link count for inode /, which is a file, is X but should be Y. The owner 
O, mode M, size S; and modify time 7 are printed. 


Possible responses to ADJUST prompt are: 
YES Replace link count of file inode / with Y. 
NO Ignore this error condition. 


6.4.7.6 LINK COUNT DIR |=| OWNER=O MODE =M SIZE=S MTIME=T 
COUNT=X SHOULD BE Y (ADJUST) 


The link count for inode /, which is a directory, is X but should be Y. The 
owner O, mode M, size S, and modify time T of directory inode / are printed. 


Possible responses to ADJUST prompt are: | | 
YES Replace link count of directory inode / with Y. 
NO | Ignore this error condition. 


6.4.7.7 LINK COUNT F I=| OWNER=O MODE =M SIZE=S MTIME=T 
COUNT=X SHOULD BE Y (ADJUST) 


The link count for F inode / is X but should be Y. The file name F, owner O, 
mode M, size S, and modify time 7 are printed. 
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Possible responses to ADJUST prompt are: 


YES Replace link count of inode / with Y. 
NO Ignore this error condition. 
6.4.7.8 UNREF FILE |=] OWNER=O MODE=M SIZE=S MTIME=T 
(CLEAR) 


Inode /, which is a file, was not connected to a directory entry when the file 
system was traversed. The owner O, mode M, size S, and modify time T of 
inode / are printed. If the —n option is not set and the file system is not 
mounted, empty files will be cleared automatically. 


Possible responses to CLEAR prompt are: 


YES Deallocate inode / by zeroing its contents. 
NO Ignore this error condition. 
6.4.7.9 UNREF DIR |=! OWNER=O MODE=M SIZE=S MTIME=T 
(CLEAR) 


inode /, which is a directory, was not connected to a directory entry when 
the file system was traversed. The owner O, mode M, size S, and modify 
time T of inode / are printed. If the —n option is not set and the file system 
is not mounted, empty directories will be cleared automatically. Nonempty 
directories will not be cleared. 


Possible responses to CLEAR prompt are: 


YES Deallocate inode / by zeroing its contents. 
NO Ignore this error condition. 
6.4.7.10 BAD/DUP FILE |=! OWNER=O MODE=M SIZE=S MTIME=T 
(CLEAR) 


Phase 1 or Phase 1b have found duplicate blocks or bad blocks associated 
with file inode /. The owner O, mode M, size S, and modify time T of inode / 
are printed. 


Possible responses to CLEAR prompt are: 


YES Deallocate inode / by zeroing its contents. 
NO Ignore this error condition. 
6.4.7.11 BAD/DUP DIR |=| OWNER=0 MODE=M SIZE=S MTIME=T 
(CLEAR) 


Phase 1 or Phase 1b have found duplicate blocks or bad blocks associated 
with directory inode /. The owner O, mode M, size S, and modify time T of 
inode / are printed. 
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Possible responses to CLEAR prompt are: | 
YES Deallocate inode / by zeroing its contents. 
NO Ignore this error condition. 

6.4.7.12 FREE INODE COUNT WRONG IN SUPERBLK (FIX) 


The actual count of the free inodes does not match the count in the 
superblock of the file system. If the —q option is specified, the count will be 
fixed automatically in the superblock. 


Possible responses to FIX prompt are: 
YES Replace count in superblock by actual count. 
NO Ignore this error condition. 

6.4.8 PHASE 5: CHECK FREE LIST. 


This phase concerns itself with the free-block list. This part lists error 
conditions resulting from bad blocks in the free-block list, bad free-blocks 
count, duplicate blocks in the free-block list, unused blocks from the file 
system not in the free-block list, and the total free-block count incorrect. 


6.4.8.1 EXCESSIVE BAD BLKS IN FREE LIST (CONTINUE) 


The free-block list contains more than a tolerable number (usually 10) of 
blocks with a value less than the first data block in the file system or greater 
than the last block in the file system. 


Possible responses to CONTINUE prompt are: 


YES Ignore rest of the free-block list and continue 
execution of fsck. This error condition will always 
invoke “BAD BLKS IN FREE LIST” error condition in 
Phase 5. 


NO Terminate program. 
6.4.8.2 EXCESSIVE DUP BLKS IN FREE LIST (CONTINUE) 


The free-block list contains more than a tolerable number (usually 10) of 
blocks claimed by inodes or earlier parts of the free-block list. 


Possible responses to CONTINUE prompt are: 


YES Ignore the rest of the free-block list and continue 
execution of fsck. This error condition will always 
invoke “DUP BLKS IN FREE LIST” error condition in 
Phase 5. 


NO Terminate program. 
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6.4.8.3 BAD FREEBLK COUNT 


The count of free blocks in a free-list block is greater than 50 or less than 0. 
This error condition will always invoke the “BAD FREE LIST” condition in 
Phase 5. 


6.4.8.4 X BAD BLKS IN FREE LIST 


X blocks in the free-block list have a block number lower than the first data 
block in the file system or greater than the last block in the file system. This 
error condition will always invoke the “BAD FREE LIST” condition in Phase 
5. 


6.4.8.5 X DUP BLKS IN FREE LIST 


X blocks claimed by inodes or earlier parts of the free-list block were found 
in the free-block list. This error condition will always invoke the “BAD FREE 
LIST” condition in Phase 5. 


6.4.8.6 X BLK(S) MISSING 


X blocks unused by the file system were not found in the free-block list. 
This error condition will always invoke the “BAD FREE LIST” condition in 
Phase 5. 


6.4.8.7 FREE BLK COUNT WRONG IN SUPERBLOCK (FIX) 


The actual count of free blocks does not match the count in the superblock 
of the file system. 


Possible responses to FIX prompt are: 
YES Replace count in superblock by actual count. 
NO Ignore this error condition. 

6.4.8.8 BAD FREE LIST (SALVAGE) 


Phase 5 has found bad blocks in the free-block list, duplicate blocks in the 
free-block list, or blocks missing from the file system. If the —q option is 
specified, the free-block list will be salvaged automatically. 


Possible responses to SALVAGE prompt are: 


YES Replace actual free-block list with a new free-block 
list. The new free-block list will be ordered to reduce 
time spent by the disk waiting for the disk to rotate 
into position. 


NO Ignore this error condition. 
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6.4.9 PHASE 6: SALVAGE FREE LIST 


This phase concerns itself with the free-block list reconstruction. This part 
lists error conditions resulting from the blocks-to-skip and blocks-per-cylinder 
values. 


6.4.9.1 Default free-block list spacing assumed 


This is an advisory message indicating the blocks-to-skip is greater than the 
blocks-per-cylinder, the blocks-to-skip is less than 1, the blocks-per-cylinder 
is less than 1, or the blocks-per-cylinder is greater than 500. The default 
values of 9 blocks-to-skip and 400 blocks-per-cylinder are used. See 
fsck(1M) in the UNIX System V Administrator Reference Manual for further 
details. 


6.4.10 CLEANUP 


Once a file system has been checked, a few cleanup functions are 
performed. This part lists advisory messages about the file system and 
modify status of the file system. 


6.4.10.1 X files Y blocks Z free 


This is an advisory message indicating that the file system checked 
contained X files using Y blocks leaving Z blocks free in the file system. 


6.4.10.2 ***** BOOT UNIX (NO SYNC!) ***** 


This is an advisory message indicating that a mounted file system or the 
root file system has been modified by fsck. If the UNIX system is not 
rebooted immediately without syne, the work done by fsck may be undone 
by the in-core copies of tables the UNIX system keeps. 


6.4.10.3 ***** FILE SYSTEM WAS MODIFIED ***** 


This is an advisory message indicating that the current file system was 
modified by fsck. 
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7. LP SPOOLING 


The line printer (LP) program is a series of commands that perform diverse 
spooling functions under the UNIX operating system. Since the primary LP 
application is off-line printing, this document focuses mainly on spooling to 
line printers. LP allows administrators to customize the system to spool to a 
collection of line printers of any type and to group printers into logical 
classes in order to maximize the throughput of the devices. Users are 
provided the capabilities of: 


e Queuing and canceling print requests 

e Preventing and allowing queuing to devices 

« Starting and stopping LP from processing requests 
e Changing configuration of printers 

e Finding status of the LP system. 


This chapter describes the role of an LP administrator in performing 
restricted functions and overseeing the smooth operation of LP. 


Throughout this chapter, each reference of the form name(1M), name(7), or 
name(8) refers to entries in the Sys5 UNIX Administrator Reference 
Manual. References to entries of the form name(N), where "N" is the 
number 1 or 6 possibly followed by a letter, refer to entry name in section N 
of the Sys5 UNIX User Reference Manual. if "N" is a number 2 through 5 
possibly followed by a letter, refer to entry name in section N of the Sys5 
UNIX Programmer Reference Manual. 


7.1 Overview of LP Features 
7.1.1 Definitions 


Several terms must be defined before presenting a brief summary of LP 
commands. The LP was designed with the flexibility to meet the needs of 
users on different UNIX systems. Changes to the LP configuration are 
performed by the Ipadmin(1M) command. 


iP makes a distinction between printers and printing devices. A device is a 
physical peripheral device or a file and is represented by a full UNIX system 
pathname. A printer is a logical name that represents a device. At different 
points in time, a printer may be associated with different devices. A class is 
a name given to an ordered list of printers. Every class must contain at 
least one printer. Each printer may be a member of zero or more classes. 
A destination is a printer or a class. One destination may be designated as 
the system default destination. The \p(1) command will direct all output to 
this destination unless the user specifies otherwise. Output that is routed to 
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a printer will be printed only by that printer, whereas output directed to a 
Class will be printed by the first available class member. 


Each invocation of Ip creates an output request that consists of the files to 
be printed and options from the Ip command line. An interface program 
which formats requests must be supplied for each printer. The LP 
scheduler, Ipsched(1M), services requests for all destinations by routing 
requests to interface programs to do the printing on devices. An LP 
configuration for a system consists of devices, destinations, and interface 
programs. | 


7.1.2 Commands 
7.1.2.1 Commands for General Use 


The Ip(1) command is used to request the printing of files. It creates an 
output request and returns a request id of the form 


dest—seqno 


to the user, where seqno is a unique sequence number across the entire LP 
system and dest is the destination where the request was routed. 


Cancel is used to cancel output requests. The user supplies request ids as 
returned by Ip or printer names, in which case the currently printing requests 
on those printers are canceled. 


Disable prevents Ipsched from routing output requests to printers. 
Enable(1) allows Ilpsched to route output requests to printers. 
7.1.2.2 Commands for LP Administrators 


Each LP system must designate a person or persons as LP administrator to 
perform the restricted functions listed below. Either the superuser or any 
user who is logged into the UNIX system as ip qualifies as an LP 
administrator. All LP files and commands are owned by Ip except for 
Ipadmin and Ipsched which are owned by root. The following commands 
will be described in more detail later in this chapter. 


ilpadmin(1M) Modifies LP configuration. Many features of this 
command cannot be used when Ipsched is running. 


Ipsched(1M) Routes output requests to interface programs which 
do the printing on devices. 


Ipshut Stops Ipsched from running. All printing activity is 
halted, but other LP commands may still be used. 


accept(1M) Allows Ip to accept output requests for destinations. 
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reject | Prevents Ip from accepting requests for destinations. 


ilpmove Moves output requests from one destination to 
another. Whole destinations may be moved at one 
time. This command cannot be used when Ipsched 
is running. 


7.2 Building LP 


All LP commands are built from source code that resides in the 
lusr/src/cmd/|p directory including the make file, /p.mk. Uniess some of the 
definitions in /p.mk are changed, LP may be installed only by the superuser. 
Before installing a new LP system, make sure there is a login called /p on 
your system and that the spool directory, /usr/spool/lp, does not exist. To 
install LP, perform the following: 


cd /usrisrc/cmd/|p 
make -f lp.mk install 


This builds all LP commands and creates an initial LP configuration 
consisting of no printers, classes, or default destination. LP must be 
configured by an LP administrator using the lpadmin command in order to 
create a useful spooler. 


In addition, add the following code to /etc/rc: 


rm —f /usr/spool/lp/SCHEDLOCK 
‘ust/lib/Ipsched 
echo “LP scheduler started" 


This starts the LP scheduler each time that the UNIX system is restarted. 


Several variables in /p.mk may be changed before installing LP to customize 
the system: 


Variable Default Value Meaning 


SPOOL fusrispool/lp spool directory 


ADMIN Ip logname of LP Administrator 
GROUP _ Din group owning LP commands/data 
ADMDIR — /usrilib commands of administrator 
USRDIR — ‘usr/bin user commands reside here 


if an existing LP spool directory is corrupted (but not the LP programs) or if 
it needs to be rebuilt from scratch, make sure that lpsched is not running 
and perform the following as superuser: 
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1. Make copies of any interface programs that are not standard LP 
software. DO NOT make these copies underneath the spool directory. 
The pathname for printer "p" is /usr/spool/lp/interface/p. 


2, rm —fr /usr/spool/Ip 


3. Make -f /p.mk new. (This recreates the bare LP configuration 
described above.) 


PRECAUTIONS 


1. Some LP commands invoke other LP commands. Moving them after 
_ they are built will cause some commands to fail. 


2. The files under the SPOOL directory should be modified only by LP 
commands. 


3. All LP commands require set-user-id permission. If this is removed, 
the commands will fail. 


7.3 Configuring LP—the “‘Ipadmin’”’ Command 


Changes to the LP configuration should be made by using the Ipadmin 
command and not by hand. Lpadmin will not attempt to alter the LP 
configuration when Ipsched is running, except where explicitly noted below. 


7.3.1 Introducing New Destinations 


The following information must be supplied to Ipadmin when introducing a 
new printer: 


1. The printer name (—p printer) is an arbitrary name which must conform 
to the following rules: | 


e It must be no longer than 14 characters. 


e It must consist solely of alphanumeric characters and 
underscores. 


e It must not be the name of an existing LP destination (printer or 
class). : 


2. The device associated with the printer (—v device). This is the 
pathname of a hard-wired printer, a login terminal, or other file that is 
writable by Ip. 


3. The printer interface program. This may be specified in one of three 
ways: 


e |t may be selected from a list of model interfaces supplied with 
LP (—m model). 
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e It may be the same interface that an existing printer uses (—e 
printer). 


e It may be a program supplied by the LP administrator (-i 
interface). 


Information which need not always be supplied when creating a new printer 
includes: 


1. 


The user may specify —h to indicate that the device for the printer is 
hardwired or the device is the name of a file (this is assumed by 
default). If, on the other hand, the device is the pathname of a login 
terminal, then -Il must be included on the command line. This 
indicates to /psched that it must automatically disable this printer each 
time /psched starts running. This fact is reported by /pstat when it 
indicates printer status: 


$ Ipstat —pa | 
printer a (login terminal) disabled Oct 31 11:15 — 
disabled by scheduler: login terminal 


This is done because device names for login terminals can be (and 


usually are) associated with different physical devices from day to day. 
If the scheduler did not take this action, somebody might log in and be 
surprised that LP is spooling to his/her terminal! 


2. The new printer may be added to an existing class or added to a new 
— Class (-cclass). New class names must conform to the same rules for 
new printer names. 
EXAMPLES 
The following examples will be referenced by further examples in later 
sections. 2 


1. Create a printer called pr1 whose device is /dev/printer and whose 


interface program is the model hp interface: 
$ /usrilib/Ipadmin —ppr1 —v/dev/printer —-mhp 


Add a printer called pr2 whose device is /dev/tty22 and whose 
interface is a variation of the model prx interface. It is also a login 
terminal: 


$ cp /usr/spool/Ip/model/prx xxx 
< edit xxx > 
$ /usr/lib/Ipadmin —ppr2 —v/dev/tty22 —ixxx — 


Create a printer called pr3 whose device is /dev/tty23. The pr3 will be 
added to a new class called cl1 and will use the same interface as 
printer pr2: | 3 
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$ /usr/lib/Ilpadmin —ppr3 —v/dev/tty23 -epr2 —ccl1 


7.3.2 Modifying Existing Destinations 


Modifications to existing destinations must always be made with respect to a 
printer name (-pprinter). The modifications may be one or more of the 
following: 


1. 


The device for the printer may be changed (—vdevice). If this is the 
only modification, then this may be done even while /psched is 
running. This facilitates changing devices for login terminals. 


The printer interface program may be changed (—mmodel, —eprinter, 
~iinterface). 


The printer may be specified as hardwired (—h) or as a login terminal 
(-I). 
The printer may be added to a new or existing class (—cclass). 


The printer may be removed from an existing class (—rclass). 
Removing the last remaining member of a class causes the class to be 
deleted. No destination may be removed if it has pending requests. 
In that case, Ipmove or cancel should be used to move or delete the 
pending requests. 


EXAMPLES 


These examples are based on the LP configuration created by those in the 
previous section. 


1. 


7-6 


Add printer pr2 to class ci1: 
$ /usr/lib/Ipadmin —ppr2 —ccl1 


Change pr2's interface program to the model prx interface, change its 
device to /dev/tty24, and add it to a new Class called cl2: 


$ /usr/lib/Ipadmin —ppr2 —mprx —v/dev/tty24 —ccl2 


Note that printers pr2 and pr3 now use different interface programs 
even though pr3 was originally created with the same interface as pr2. 
Printer pr2 is now a member of two classes. 


Specify printer pr2 as a hard-wired printer: 
$ /usr/lib/Ipadmin —ppr2 —h 

Add printer pri to class cl2: 

$ /usr/lib/Ipadmin —ppr1 —ccl2 


The members of class cl2 are now pr2 and pri, in that order. 
Requests routed to class cl2 will be serviced by pr2 if both pr2 and pr 
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are ready to print; otherwise, they will be printed by the one which is 
next ready to print. 


5. Remove printers pr2 and pr3 from class cl1: 


$ /usr/lib/Ipadmin —ppr2 —rcl1 
$ /usr/lib/Ipadmin —ppr3 —rcl1 


Since pr3 was the last remaining member of class cl1, the class is 
removed. 


6. Add prs to a new class called cls. 
$ /usr/lib/Ipadmin —ppr3 —ccl3 
7.3.3 Specifying the System Default Destination 


The system default destination may be changed even when Ipsched is 
running. 


EXAMPLES 
1. Establish class cli as the system default destination: 
$ /usr/lib/lpadmin —dcl1 
2. Establish no default destination: 
$ /usr/ib/Ipadmin —d 
7.3.4 Removing Destinations 


Classes and printers may be removed only if there are no pending requests 
that were routed to them. Pending requests must either be canceled using 
cancel or moved to other destinations using Ipmove before destinations 
may be removed. If the removed destination is the system default 
destination, then the system will have no default destination until the default 
destination is respecified. When the last remaining member of a class is 
removed, then the class is also removed. The removal of a class never 
implies the removal of printers. 


EXAMPLES 
1. Make printer pr1 the system default destination: 
$ /usr/lib/lpadmin —dpr1 
Remove printer prt: 
$ /usr/lib/lpadmin —xpr1 
Now there is no system default destination. 


2. Remove printer pr2: 
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$ /usr/lib/Ipadmin —xpr2 
Class cl2 is also removed since pr2 was its only member. 
3. Remove class cl3: 
$ /usr/lib/Ipadmin —xcl3 
Class cl3 is removed, but printer pr3 remains. 
7.4 Making an Output Request—the “‘Ip’’ Command 


Once LP destinations have been created, users may request output by 
using the Ip command. The request id that is returned may be used to see 
if the request has been printed or to cancel the request. 


The LP program determines the destination of a request by checking the 
following list in order: 


e If the user specifies -ddest on the command line, then the request is 
routed to dest. 


e If the environment variable LPDEST is set, the request is routed to 
the value of LPDEST. 


e If there is a system default destination, then the request is routed 
there. 


e The request is rejected. 
EXAMPLES 


1. There are at least four ways to print the password file on the system 
default destination: 


Ip /etc/passwd 

lp < /etc/passwd 
cat /etc/passwd | Ip 
lp —c /etc/passwd 


The last three ways cause copies of the file to be printed, whereas the 
first way prints the file directly. Thus, if the file is modified between the 
time the request is made and the time it is actually printed, then the 
changes will be reflected in the output. | 


2. Print two copies of file abc on printer xyz and title the output “my file’: 


pr abc | Ip —dxyz —n2 -t'"my file” 
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3. Print file xxx on a Diablo“ 1640 printer called zoo in 12-pitch and write 
to the user's terminal when printing has completed: 


Ip —dzoo —012 —w xxx 


In this example, “12” is an option that is meaningful to the model 
Diablo 1640 interface program that prints output in 12-pitch mode [see 
Ilpadmin(1M)]. 


7.5 Finding LP Status—LPSTAT 


The Ipstat command is used to find status information about LP requests, 
destinations, and the scheduler. 


EXAMPLES 
1. List the status of all pending output requests made by this user: 
lpstat 


The status information for a request includes the request id, the 
logname of the user, the total number of characters to be printed, and 
the date and time the request was made. 


2. List the status of printers p1 and p2: 
lpstat —pp1,p2 
7.6 Cancleing Request—CANCEL 


The LP requests may be canceled using the cancel command. Two kinds 
of arguments may be given to the command—request ids and printer 
names. The requests named by the request ids are canceled and requests 
that are currently printing on the named printers are canceled. Both types of 
arguments may be intermixed. 


EXAMPLE 
Cancel the request that is now printing on printer xyz: 
cancel xyz 


lf the user that is canceling a request is not the same one that made the 
request, then mail is sent to the owner of the request. LP allows any user to 
cancel requests in order to eliminate the need for users to find LP 
administrators when unusual output should be purged from printers. 


* Registered trademark of Xerox Corporation 
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7.7 Allowing and Refusing Requests—ACCEPT and REJECT 


When a new destination is created, Ip will reject requests that are routed to 
it. When the LP administrator is sure that it is set up correctly, he or she 
should allow Ip to accept requests for that destination. The accept 
command performs this function. 7 


Sometimes it is necessary to prevent Ip from routing requests to 
destinations. If printers have been removed or are waiting to be repaired or 
if too many requests are building for printers, then it may be desirable to 
cause Ip to reject requests for those destinations. The reject command 
performs this function. After the condition that led to the rejection of 
requests has been remedied, the accept command should be used to allow 
requests to be taken again. 


The acceptance status of destinations is reported by the —a option of Ipstat. 
EXAMPLES 
1. Cause Ip to reject requests for destination xyz: 
/usr/lib/reject —rprinter xyz needs repair" xyz 
Any users that try to route requests to xyz will encounter the following: 


$ Ip —dxyz file 
Ip: can not accept requests for destination "xyz" 
-- printer xyz needs repair 


2. Allow Ip to accept requests routed to destination xyz: 
/usr/lib/accept xyz 
7.8 Allowing and Inhibiting Printing—ENABLE and DISABLE 


The enable command allows the LP scheduler to print requests on printers. 
That is, the scheduler routes requests only to the interface programs of 
enabled printers. Note that it is possible to enable a printer and at the same 
time prevent further requests from being routed to it. 


The disable command will undo the effects of the enable command. It 
prevents the scheduler from routing requests to printers, independently of 
whether or not Ip is allowing them to accept requests. Printers may be 
disabled for several reasons including malfunctioning hardware, paper jams, 
and end of day shutdowns. If a printer is busy at the time it is disabled, then 
the request that was printing will be reprinted in its entirety either on another 
printer (if the request was originally routed to a class of printers) or on the 
same one when the printer is reenabled. The —c option causes the currently 
printing requests on busy printers to be canceled in addition to disabling the 
printers. This is useful if strange output is causing a printer to behave 
abnormally. 
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EXAMPLE 
Disable printer xyz because of a paper jam: 


$ disable —r"paper jam" xyz 
printer "xyz" now disabled 


Find the status of printer xyz: 

$ Ipstat —pxyz 

printer "xyz" disabled since Jan 5 10:15 — 
paper jam 

Now, reenable xyz: 


$ enable xyz 
printer "xyz" now enabled 


7.9 Moving Requests Between Destinations—LPMOVE 


Occasionally, it is useful for LP administrators to move output requests 
between destinations. For instance, when a printer is down for repairs, it 
may be desirable to move all of its pending requests to a working printer. 
This is one way to use the Ipmove command. The other use of this 
command is to move specific requests to a different destination. Lpmove 
will refuse to move requests while the LP scheduler is running. 


EXAMPLES 
1. Move all requests for printer abc to printer xyz: 
$ /usrilib/lpmove abc xyz 


All of the moved requests are renamed from abc-nnn to xyz-nnn. As a 
side effect, destination abc is no longer accepting further requests. 


2. Move requests z00-543 and abc-1200 to printer xyz: 
$ /usr/lib/Ipmove zoo-543 abc-1200 xyz 
The two requests are now renamed xyz-543 and xyz-1200. 
7.10 Stopping and Starting the Scheduler—LPSHUT and LPSCHED 


Lpsched is the program that routes the output requests that were made 
with Ip through the appropriate printer interface programs to be printed on 
line printers. Each time the scheduler routes a request to an interface 
program, it records an entry in the log file, /usr/spool/Ip/log. This entry 
contains the logname of the user that made the request, the request id, the 
name of the printer that the request is being printed on, and the date and 
time that printing first started. In the case that a request has been restarted, 
more than one entry in the log file may refer to the request. The scheduler 
also records error messages in the log file. When lIpsched is started, it 
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renames /usr/spool/Ip/log to /usr/spool/lp/oldiog and starts a new log file. 


No printing will be performed by the LP system unless Ipsched is running. 
Use the command 


Ipstat —r 
to find the status of the LP scheduler. 


Lpsched is normally started by the /etc/rc program as described above and 
continues to run until the UNIX system is shut down. The scheduler 
operates in the /usr/spool/Ip directory. When it starts running, it will exit 
immediately if a file called SCHEDLOCK exists. Otherwise, it creates this 
file in order to prevent more than one scheduler from running at the same 
time. 


Occasionally, it is necessary to shut down the scheduler in order to 
reconfigure LP or to rebuild the LP software. The command 


/usr/lib/Ipshut 


causes Ipsched to stop running and terminates all printing activity. All 
requests that were in the middle of printing will be reprinted in their entirety 
when the scheduler is restarted. 


To restart the LP scheduler, use the command 
/usr/lib/lpsched 


Shortly after this command is entered, Ipstat should report that the 
scheduler is running. If not, it is possible that a previous invocation of 
Ipsched exited without removing SCHEDLOCK, so try the following: 


rm -f /usr/spool/lp, SCHEDLOCK 
/usr/lib/Ipsched 


The scheduler should be running now. 
7.11 Printer Interface Programs 


Every LP printer must have an interface program which does the actual 
printing on the device that is currently associated with the printer. Interface 
programs may be shell procedures, C programs, or any other executable 
program. The LP model interfaces are all written as shell procedures and 
can be found in the /usr/spool/lp/model directory. At the time Ipsched 
routes an output request to a printer P, the interface program for P is 
invoked in the directory /usr/spool/lp as follows: 
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interface/P id user title copies options file ... 
where 

id is the request id returned by Ip 

user is logname of user who made the request 
title is optional title specified by the user 
copies is number of copies requested by user 
options is a blank-separated list of class or 
printer-dependent options specified by user 


file is the full pathname of a file to be printed 


EXAMPLES 


The following examples are requests made by user “smith” with a system 
default destination of printer “xyz”. Each example lists an lp command line 
followed by the corresponding command line generated for printer xyz's 
interface program: 


1. Ip /etc/passwd /etc/group 
interface/xyz xyz-52 smith "" 1 "" /etc/passwd /etc/group 


2. pr/etc/passwd | Ip -t"users” —n5 
interface/xyz xyz—53 smith users 5 "" /usr/spool/lp/request/xyz/d0—53 


3. Ip /etc/passwd —oa —ob 
interface/xyz xyz—54 smith "" 1 "a b" /etc/passwd 


When the interface program is invoked, its standard input comes from 
/dev/null and both the standard output and standard error output are 
directed to the printer's device. Devices are opened for reading as well as 
writing when file modes permit. In the case where a device is a regular file, 
all output is appended to the end of the file. 


Given the command line arguments and the output directed to a device, 
interface programs may format their output in any way they choose. 
Interface programs must ensure that the proper stty modes (terminal 
characteristics such as baud rate, output options, etc.) are in effect on the 
Output device. This may be done in a shell interface only if the device is 
opened for reading: 


stty mode ... <&1 
That is, take the standard input for the stty command from the device. 


When printing has completed, it is the responsibility of the interface program 
to exit with a code indicative of the success of the print job. Exit codes are 
interpreted by Ipsched as follows: 


CODE MEANING TO LPSCHED 
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0 


The print job has completed successfully. 


1 to 127 A problem was encountered in printing this 


particular request (e.g., too many nonprintable 
characters). This problem will not affect future print 
jobs. Lpsched notifies users by mail that there 
was an error in printing the request. 


greater than 127 These codes are reserved for internal use by 


Ipsched. Interface programs must not exit with 
codes in this range. 


When problems that are likely to affect future print jobs occur (e.g., a device 
filter program is missing), the interface programs would be wise to disable 
printers so that print requests are not lost. When a busy printer is disabled, 
the interface program will be terminated with signal 15. 


7.12 Setting Up Hard-Wired Devices and rege Terminals as LP 


Printers 


7.12.1 Hard-wired Devices 


As an example of how to set up a hard- wired device for use as an LP 
printer, consider using tty line 15 as per XYZ. As superuser, perform the 
following: 


als 
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Avoid unwanted output from non-LP processes and ensure that LP 
can write to the device: | | 


$ chown Ip /dev/tty15 
$ chmod 600 /dev/tty15 


Change /etc/inittab so that | ttyi5 is not a login terminal. In other 
words, ensure that /efc/getty is not trying to log users in at this 
terminal. Change the entries for tty15 to: 


15:2:off:/etc/getty -t60 tty15 1200 
Enter the command: 
$ telinit Q 


If there is currently an invocation of /efc/getty running on tty15, kill it. 
When the UNIX system is rebooted, tty15 will be initialized with default 
stty modes. Thus, it is up to LP interface programs to establish the 
proper baud rate and other stty modes for correct printing to occur. 


Introduce printer xyz to LP using the model prx interface program: 
$ /usr/lib/Ipadmin —pxyz —v/dev/tty15 —mprx | 
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4. When xyz is created, it will initially be disabled and Ip will be rejecting 
requests routed to it. If it is desired, allow Ip to accept requests for 
xyz: 


/usr/lib/accept xyz 


This will allow requests to build up for xyz and to be printed when it is 
enabled at a later time. 
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When it is desired for printing to occur, be sure that the printer is ready 
to receive output. For several printers, this means that the top of form 
has been adjusted and that the printer is on-line. Enable printing to 
occur On xyz: 


enable xyz 


When requests have been routed to xyz, they will begin printing. 


7.12.2 Login Terminals 


Login terminals may also be used as LP printers. To do this for a Diablo 
1640 terminal called abc, perform the following: 


1. 
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Introduce printer abc to LP using the model 1640 interface program: 
$ /usr/lib/Ipadmin —pabc —v/dev/null —-m1640 - 


Note that /dev/null is used as abc's device because we will specify the 
actual device each time that abc is enabled. This device may be 
different from day to day. When abc is created, it will initially be 
disabled; and Ip will be rejecting requests routed to it. If it is desired, 
allow Ip to accept requests for abc: 


/usr/lib/accept abc 


This will allow requests to build up for abc and to be printed when it is 
enabled at a later time. It is not advisable to enable abc for printing, 
however, until the following steps have been taken. 


Log terminal in if this has not already been done. 


Assuming the tty(1) command reports that this terminal is /dev/tty02, 
associate this device with printer abc: 


$ /usr/lib/Ipadmin —pabc —v/dev/ttyO2 


Note that Ilpadmin may be used only by an LP administrator. If it is 
desired for other users to routinely perform this step, then an LPA may 
establish a program owned by Ip or by root with set-user-id 
permission that performs this function. 


When it is desired for printing to occur, be sure that the printer is ready 
to receive output. For several printers, this means that the top of form 
has been adjusted. Enable printing to occur on abc: 


enable abc 
When requests have been routed to abc, they will begin printing. 


When all printing has stopped on abc or when you want it back as a 
reguiar login terminal, you may prevent it from printing more output: 
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$ disable abc 
printer "abc" now disabled 


lf abc is enabled when the UNIX system is rebooted or when Ipsched 
is restarted, it will be disabled automatically. 


7.13 Summary 


The administrative functions of the LP administrator have been described in 
detail. These functions include configuring and _ reconfiguring LP; 
maintaining printer interface programs; accepting, rejecting, and moving print 
requests; stopping and starting the LP scheduler; and enabling and disabling 
printers. LP offers administrators the following advantages over other 
centrally supported printer packages: 


e Printers may be grouped into classes. 
e LP may be configured to meet the needs of each site. 


e Administrators may supply interface programs to format output in any 
way desirable. 


e LP functions are performed by simple commands and not by hand. 
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This document describes the UNIX Virtual Protocol Machine (VPM). VPM is 
a general-purpose UNIX interface for synchronous communications lines. 
VPM allows link-level protocols such as BISYNC and HDLC to be 
inplemented on the Plexus ICP microcomputer in a high-level language. The 
hardware required to support VPM is a Plexus host computer, and an ICP. 
The link-level communications protocol is excuted by the VPM interpreter 
running in the Plexus ICP. This implementation technique leads to a 
portable protocol representation and efficient protocol execution. 


The VPM software consists of a protocol compiler, a UNIX driver, an 
interpreter that executes in the Plexus ICP, and several utility programs. The 
compiler, which executes in the host computer, translates a protocol 
described in a high-level language into a load module for the ICP. The load 
module contains the VPM interpreter and a compiled representation of the 
protocol. The interpreter executes the protocol, communicates with the UNIX 
driver in the host computer, and controls the communications line interface. 


The first release of VPM supported a large class of protocols collectively 
known as BISYNC. These protocols are distinguished by the use of control 
characters to provide framing and transparency. At the frame level, these 
protocols operate in a half-duplex manner, although they sometimes use 
full-duplex communications facilities to reduce the time required to reverse 
the direction of transmission. 


The release of VPM adds support for bit-oriented, full-duplex protocols. This 
class of protocols includes IBM’s Synchronous Data Link Control (SDLC) 
and the international standard High-Level Data Link Control (HDLC). LAPB, 
a subset of HDLC which is the link-level protocol specified in the BX.25 Bell 
System Standard, has been implemented using VPM and is available with 
the Sys5 release. The interpreter used for bit-oriented protocols is different 
from that used for character-oriented (BISYNC) protocols. The appropriate 
interpreter is selected by means of a compiler option. 


Other features of VPM include: 


1. an increase in the number of transmit and receive buffers which the 
interpreter can accept at one time. 


additional debugging facilities. 


provisions for interprocess communication between the protocol script 
and a UNIX driver or a user process, and 


4. a cleaner separation of functions in the UNIX driver to facilitate 
tailoring of VPM to particular applications. 
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8.1 Support for Bit-Oriented Protocols 


The capability to use bit-oriented protocols such as HDLC is provided by a 
new set of communications primitives. These primitives are frame-oriented 
and non-blocking, whereas the BISYNC primitives are character-oriented 
and blocking. The new primitives: are fully described in the attached manual 
entry for vomc(1C). An overview of these primitives follows. 


The VPM interpreter maintains a set of queues for transmit buffers. When a 
transmit buffer is passed to the ICP by the UNIX driver, the buffer is 
appended to the unopened-transmit-buffer queue. The protocol script in the 
ICP obtains a transmit buffer from the unopened-transmit-buffer queue by 
means of the getxfrm primitive; the buffer is then said to be open. In order 
to get (open) a transmit buffer, the script must provide a transmit-sequence 
number. This sequence number must be distinct from the sequence number 
currently assigned to every other currently-open transmit buffer. This 
sequence number is used to identify the buffer for subsequent calls to the 
xmtfrm and rtnxfrm primitives. The xmtfrm primitive initiates transmission of 
the specified buffer, using the control information specified by a previous 
setcti primitive. Transmission proceeds asynchronously. The script can test 
for completion of an output transfer by means of the xmtbusy primitive. 
Open transmit buffers can be transmitted any number of times. When the 
script decides that a buffer has successfully been received at the 
destination, it notifies the interpreter by means of the rinxfrm primitive. This 
causes the buffer to be placed on the transmit-buffer-return queue; the 
buffer is then no longer considered to be open and the sequence number 
can be reused. The driver is notified as soon as possible that the buffer has 
been closed. The buffer is then removed from the transmit-buffer-return 
queue. 


When a receive buffer is passed to the ICP by the driver, the buffer is 
placed on the empty-receive-buffer queue. When the first byte of a new 
frame arrives, an empty receive buffer is obtained from the empty-receive- 
buffer queue and the incoming characters are placed into the buffer as they 
arrive. An incoming frame will be discarded if the frame is too short (less 
than four bytes including CRC), if the frame is too long to fit in the receive 
buffer, or if the CRC is incorrect. If a frame is received successfully, the 
buffer is placed on the completed-receive frame queue, otherwise the buffer 
is returned to the empty-receive-buffer queue. When the script executes a 
rcvirm primitive, the buffer at the head of the completed-receive-frame 
queue is removed from that queue and becomes the current receive buffer. 
If the script subsequently executes a rtnrfrm primitive before executing 
another rcvirm primitive, the current receive buffer is placed on the receive- 
buffer-return queue. If the script executes a rcvfrm primitive before 
executing a rtnrfrm primitive, the current receive buffer, if any, is returned to 
the empty-receive-frame queue. Buffers on the receive-buffer-return queue 
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are returned to the driver at the first opportunity. If the empty-receive-buffer 
queue is empty when the first byte of a new frame is rceived, the first five 
bytes of the frame are retained in a staging area and the remainder of the 
frame is discarded. This allows a protocol script to receive a control frame 
(up to seven bytes including CRC) when no data buffer is available. When 
the next rcvfrm primitive is executed, the script will receive the information in 
the staging area along with an indication that the remainder of the frame has 
been discarded. If another frame arrives while the staging area is thus 
occupied, the new frame is discarded entirely. 


A count is kept of the number of frames discarded for each reason. These 
counters may be read and reset from the host computer. 


8.1.1 The VPM Split Driver 


Since the VPM interpreter and a protocol script generally use most of the 
memory of the ICP any higher levels of protocol that are required must be 
executed by the host CPU. The purpose of the VPM split driver is to 
provide a framework in which higher-level protocols can be implemented 
conveniently using low-level routines in the VPM driver to communicate with 
the interpreter in the ICP. 


A set of functions has been written that provides a general-purpose interface 
to the link-level protocol being executed by the interpreter in the ICP. Their 
capabilities include a means to queue transmit and empty receive buffers for 
use by the protocol script in the ICP, to start and stop the script, and to 
send commands to and receive reports from the script. A means of getting 
a copy of and resetting the VPM interpreter’s error counters is also provided. 
These functions will be referred to as interface functions or collectively as 
the interface module. Appendix | contains a description of each of these 
routines. 


To implement higher levels of a protocol as a UNIX device driver, a set of 
routines must be written to implement the standard UNIX system calls: 
open, close, read, write, and ioct! as well as the required protocol. These 
routines will be referred to as protocol functions or collectively as a protocol 
module. The standard VPM driver does not implement a_ higher-level 
protocol but instead provides a transparent user interface that can be used 
by applications that supply their own higher levels of protocol. This driver 
can be used as an example for those interested in writing a different 
protocol module. Appendix 2 contains a description of these routines. 


At lease two other protocol modules have been written thus far. They are 
the Synchronous Terminal Interface [4, st(4)], and the BANCS THP 
Interface. 


VPM allows up to four different protocol modules to be executing 
simultaneously. One ICP and one interface-module minor device” is required 
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for each protocol being executed. Any number of protocol modules may be 
implemented, but no more than four can be in use at any one time since no 
more than four ICPs are supported. In general, each protocol module can 
have up to 256 minor devices. The VPM protocol module, however, can 
have at most I6 minor devices; this restriction is due to the fact that the 
minor device number of the VPM protocol module is used not only to specify 
the VPM minor device but also to specify the interface-module minor device 
and the ICP minor device. The low-order four bits of the protocol-module 
minor device number determine the protocol-module minor device; the next 
two bits determine the interface-module minor device; the next two bits 
determine the ICP minor device. 


Transmit buffers and receive buffers are passed between the VPM 
interpreter, the interface module, and the protocol module by means of 
pointers to data structures known as buffer descriptors. The buffer- 
descriptor structure is defined as follows: 


struct vpmbd { 


short c_ct; | /* Buffer size */ 

short d_adres; /* Low-order |6 bits of buffer address */ 
char d_hbits; /* High-order 2 bits of buffer address */ 
char d_sta; /* Protocol-dependent */ 

char d_type; /* Protocol-dependent */ 

char d_dev; /* Protocol-dependent */ 

struct buf “d_buf; /* Pointer to system buffer descriptor */ 
int d_bos; /* Index of next byte in buffer */ 

int d_vpmtdev; /* Minor device number */ 


} 


For empty receive buffers, c_ct must be equal to the buffer size in bytes; for 
transmit buffers, c_ct must be equal to the number of bytes to be 
transmitted. When a receive buffer is returned to the protocol module, c_ct 
is equal to the number of data bytes in the buffer. D_adres and d_hbits must 
contain an 18-bit MULTIBUS-mapped buffer address; the low-order |6 bits 
must be in d_adres and the high-order two bits must be in the low-order two 
bits of id_hbits. D_type, d_sta, and d_dev are protocol-dependent; when 
using the BISYNC interpreter these three bytes may be read and modified 
by the protocol script. See the discussion of getxbuf, getrbuf, rtnxbuf, and 
rtnrbuf in vpmc(1C). D_buf contains a pointer to a system buffer descripter; 
this is used to return the buffer to the system buffer pool. D_bos is the index 
of the first byte in the buffer not yet returned to the user. D_vpmdev is the 
minor device number of the protocol-module minor device to which the 
buffer is allocated. 


8-4 Sys5 UNIX 


VIRTUAL PROTOCOL MACHINE CHAPTER 8 


8.1.2 The Trace Driver 


The trace driver provides a means by which a user program can receive 
trace information generated by the VPM driver and the protocol script to aid 
in debugging new protocol modules and protocol scripts. It may also be 
used to debug other drivers or system code not related to the VPM driver. 
This driver can be configured to have a number of minor devices. Each 
minor device provides a means by which a user program can read data 
generated by functions within the operating system. This data is recorded by 
calls to trsave as described in Appendix 3. Each call to trsave generates a 
unit of data known as an event record which consists of a channel number 
(one byte), a count (one byte) and count bytes of data. The channel number 
can be used to multiplex up to I6 data streams on each minor device. 


Associated with each minor device of the trace driver is a clist queue, which 
is used to save event records provided a user program has that minor 
device open and has enabled the channel to which the event records were 
written. Channels may be enabled in any combination, using the joct/ 
command VPMTRCO. See the manual entry for trace(4). While a minor 
device read queue is full, event records for that minor device are discarded. 
Appendix 3 contains a description of each trace-driver routine. 


Minor device O of the trace driver is used by the VPM driver to record a 
variety of debugging information generated within the VPM driver and also 
to record the data generated by the trace primitive in a protocol script. Minor 
device 1 of the trace driver is used to record the information generated by 
the snap primitive in a protocol script. The vomtrace and vmpsnap 
commands are available for reading and formatting the data passed via 
these two minor devices. These two commands are described in the 
attached manual entry for vomstart(IC). Appendix 4 contains a description 
of the VPM driver event trace. 


8.1.3 Miscellaneous Improvements 


Two new primitives have been added to the protocol language to allow 
communication between the link-level protocol script in the ICP and a 
higher-level protocol implemented in a user program or a VPM protocol 
module. The getcmd primitive allows the script to receive a four-byte 
command from a user program or a protocol module. The standard VPM 
protocol module allows a user program to pass a command to the script via 
an jocti system call. Other VPM protocol modules can pass a command to 
the script by calling the vomcemd routine in the VPM interface module. The 
rtnrpt primitive allows the script in the ICP to send a four-byte report to a 
protocol module or to a user program. The standard VPM protocol module 
allows a user program to receive a script report by means of an /oct! system 
call. A protocol module can receive reports from the interface module by 
calling the vomrpt routine of the VPM interface module. 
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The trace primitive of the protocol language has been augmented to allow 
two arguments. The form with one argument is still supported; if only one 
argument is given, the second argument is assumed to be zero. A snap 
primitive has been added. This primitive causes four bytes of data from the 
script followed by a four-byte time stamp to be placed on the read queue for 
trace driver minor device 1. 


The time primitive that allows a script to initialize a timer or test its current 
value. If the argument to fimer is non-zero, the timer is initialized with the 
_ value of the argument. The timer is decremented ten times a second until it 
reaches zero. If the timer primitive is called with an argument of zero, it 
returns the current value of the timer. This value is zero if the timer has 
expired, otherwise non-zero. 


The interpreter would accept at most one transmit buffer and one receive 
buffer at any given time. In the interpreter will accept up to four transmit 
buffers and four receive buffers at a time. This applies to the bit-oriented 
(HDLC) interpreter only. 


8.2 Implementation 


This section has two parts: the first gives configuration guidelines for VPM 
and the ICPs and tells how to install and boot VPM; the second gives 
procedures for compiling and link-loading protocol scripts. 


8.2.1 Installing and Booting VPM 


Each ICP can support up to eight users. If VPM is also installed, an 


additional dedicated ICP is required as the VPM. Therefore, a P/60 with 32 
users and a VPM requires FIVE ICPs. A P/15, P/20 or P/35 with 8 users 
and a VPM requires TWO ICPs. 


For all systems, the lowest numbered ICP must be the VPM. Thus while 
VPM is operating, ports 0-7 may not be used as TTY ports; users’ TTY ports 
must be numbered beginning with 8. For example, on a P/60 with 16 users 
and VPM, the VPM uses ports 0-7 and users’ TTYs are numbered 8-23. 
The port assignments are changed by modifying the file /etc/inittab for use 
with VPM. 


If you want VPM in operation only intermittently, the VPM ICP can function 
to a limited extent as a TTY ICP; the single wire-wrapped device and the 
parallel port are unavailable. In other words, seven devices remain available 
on the VPM iCP when VPM is not operating. You can switch back and forth 
by alternating between versions of /etc/re that call different versions of 
/etc/inittab. | 


Six basic steps are required to bring up VPM. The following sections 
describe each step. 
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Perform several small hardware changes. 
Create the VPM devices. 

Modify /etc/inittab. 

Modify /etc/rc. 

Boot Sys3. 


Modify the VPM library for switched or constant carrier. 


8.2.1.1 Hardware Installation 


The ICP(s) that are to be the VPM(s) require the following special hardware 
features. 


Note that port 2 on ICPO is the recommended VPM port. Further references 
to a port will be to port 2. 


1. 


The line(s) that are to be synchronous require the Carrier Detect signal 
and Clear to Send signal to be strapped on the ICP. (See Plexus 
User's Manual.) 


The pin-pairs 3 and 4 must be jumpered for synchronous transmission. 
Set up the 10-pin jumper network to look like the diagram below. 
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Make sure your VPM ICP is part number 60-00079 Rev Z, 60-00079-1 
(any rev), 60-00085 (any rev), or 60-00091 (any rev). Earlier ICPs 
must be upgraded in order to be used for VPM. 


Each port using synchronous transmission must be configured for 
external clock. This is accomplished via a pair of three-pin jumper 
networks for each port on that ICP. The jumper networks are 
designated Tx and Xx, where ‘x’ means the port number. 


a. lf your VPM ICP is part number 60-00079 Rev Z, 60-00079-1 
(any rev), or 60-00085 (any rev), the following describes the 
procedure for configuring a part for external clocking. 
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Normally, the center pin (A) is jumpered to either outer pin (B or 
C) using a two-pin female jumper block. However, configuring a 
port for external clocking requires wirewrapping pin B to pin C on 
both jumper networks for the port. 


For example, to strap port 2 for external clock, do the following: 
e Remove the two-pin jumper blocks from T2 and from X2. 
e Using a wirewrap tool, connect T2-B to T2-C. 
e Using a wirewrap tool, connect X2-B to X2-C. 


b. If your VPM ICP is part number 60-00091 (any rev), the 
procedure for configuring a port for external clocking is simpler. 
On this ICP, the common pin(c) has been moved to facilitate 
changing the clocking mode of the port. On previous ICP’s, the 
common was located between the other two pins (A and B), 
making wirewrapping necessary for external clocking. On this 
ICP, switching to external clocking only requires moving the 
jumper blocks from pin pair A-C to B-C. 


For example, to strap port 2 for external clock, do the following: 
e Remove the two-pin jumper blocks from T2 and from X2. 
e Jumper pins X2-B and X2-C using the jumper block. 
e Jumper pins T2-B and T2-C using the jumper block. 


c. The connection cable between the VPM ICP port and the 
synchronous modem is a specially strapped Plexus RS232C 
modem cable. The following RS232C modem cable leads must 


be strapped: 

VPM ICP Port Modem Cable 
RS232 Leads to RS232 Leads 
2 3 
3 2 
4 5 
5 4 
6 20 
7 7 
8 8 
15 15 
17 17 
20 6 


d. If your ICP is part number 60-00085 rev H or later, you must set 
switches on switchpak D1 (may also be labeled U-51). If your 
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VPM ICP is port 0-3, switch 3 must be on. If your VPM ICP is 
port 4-7, switch 4 must be on. 


8.2.2 Create the VPM Devices 


The installation of Sys5.2 automatically creates the VPM devices. The 
following is an explanation of how they were created. 


Login as root and bring your system to init state 1. Then use mknod(1M) to 
create a node for each VPM line and each ICP: 


/etc/mknod /dev/vpm? c <major> <minor> 
/etc/mknod /dev/ic? c <major> <minor> 


where major is 18. minor is defined as follows: the two most significant bits 
denote the physical ICP number (0-3), the next two bits denote the VPM 
protocol number (0-3), and the four least significant bits denote the the 
physical line number on the ICP. For example, if I|CPs 0 and 1 are to be 
used for VPM using protocol number 1 and line number 3, then the minor 
device numbers should be 023 and 0123, respectively. Input may be in 
decimal or octal. 


For example, the mknod step might proceed as follows: 
mknod /dev/vpm2 c 18 2 


lf TTY devices have been displaced by the new VPM ICP, you must do 
mknods for these TTYs to link them to a different ICP. 


8.2.3 Modify /etc/inittab 


Change the logical device assignments in /etc/inittab so that only VPM 
devices (/dev/vpm0O - /dev/vpm7) are assigned ports 0-7 on ICPO. TTY 
devices formerly assigned these ports should receive port assignments on 
different ICPs. The lines for ports 0-7 on ICPO should look like this: 


2:00:off:/etc/getty tty0 b 
2:01 :off:/etc/getty tty1 b 
2:02:off:/etc/getty tty2 b 


2:07 :off:/etc/getty tty7 b 
Note that logins are disabled on VPM ports. 


If port 2 on ICPO is moved to port 2 on ICP 3 (port 26 from the system's 
point of view), the old line 
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2:02:respawn:/etc/getty tty2 b 
should be changed to | | 

2:26:respawn:/etc/getty tty26 b 
8.2.4 Modify /etc/rc 


Verify that your /etc/rc downloads your ICPs correctly when the system is 
brought to multi-user state. No download step is required for VPM ICPs. 
Since the VPM ICP must be the first (ICPO), your other ICP(s) must change 
their device numbers and the lines in /ete/re that download these ICPs must 
reflect these new numbers. Find the lines that do the dnid command. The 
lines should look like this: 


/etc/dnid -d -f /usr/lib/dnid/icp -o /dev/icn -a 4000 


Verify that /etc/re contains a line like this one for each of your ICPs. If it 
does not, edit /etc/re, adding line(s) for the missing ICP(s). Increment n as 
appropriate to reflect the addition of the VPM ICP as ICPO. 


8.2.5 Reboot 


Shutdown and reboot ‘normally, following the procedures in the Plexus 
User’s Manual. 


8.2.6 Switched or Constant Carrier 


VPM uses whatever library is in the file /usr/sre/uts/m68/icp/libypm.a. If 
you require switched carrier, do nothing; the correct file is already in place. 
If you require constant Carrier, back up the file 
/usr/src/uts/m68/icp/libvpm.a and copy the file 
/usr/src/uts/m68/icp/libvpm.a.ccar to /usr/src/uts/m68/icp/libvpm.a. 


8.2.7 Compiling and Loading VPM Scripts 
This section gives the steps to compile and load VPM scripts. 


1. You must be in the directory /usr/src/cmd/vpm, so issue the 
command 


cd /usr/src/cmd/vpm 


2. Then move your protocol script into this directory, renaming it 
vpmscript.r. 


mv <your protocol script name> vpmscript.r 
3. Execute the following command | 
make -f vpmscript.mk 


This compiles the script in vpmscript.r and link-loads this compiled 
script with the rest of the VPM ICP kernel. The object file created in 
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this step is called vpm0; it is down-loadable into the VPM ICP. 
4. To download vpm0, use either of the programs dnid(1) or 
vpmstart(1). : 


5. You may combine up to four scripts in one download module. Scripts 
to be combined in this way must be called proto?code.s, where ‘?” 
represents a number from 0 to 3. To combine scripts, you must 
modify the file vpmscript.mk. inserting instructions for each script 
(proto?code.s) to be compiled into a proto?.s file, where ‘“?” 
represents a number from 0 to 3. The following lines accomplish this 
compilation; note that this whole series of steps must be done for each 
script. Therefore, you must copy these lines into the file 
vpmscript.mk once for each script, making sure you make the 
appropriate substitution of a number for the “?”. 


(2) fgrep define sas_tempc > sas_define 

(3) cat /usr/include/icp/opdef.h sas_tempc | /lib/cpp > tf 
(4) /usr/lib/vpm/vratfor < tf > tg 

(5) cp tg /usr/src/uts/m68/icp/vpmicp/proto?code.s 

(6) cat sas_define proto?.s > th 


(7) cp th /usr/src/uts/m68/icp/vpmicp/proto?.s 
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- 8.3 Appendix 1 - The VPM Interface Module 


The VPM interface functions provide a general-purpose interface between a 
higher-level protocol implemented in a VPM protocol module and the link- 
level protocol script executed by the VPM interpreter in the ICP. The ICP 
driver is used by the interface functions to pass commands to and receive 
reports from the VPM interpreter. When reports are received by the interface 
module that must be passed on to the protocol module, the protocol 
module’s receive-interrupt routine (vpmtrint in the case of the standard VPM 
protocol module) is called. 


This appendix describes each interface function. Dev is an argument to 
many of the interface functions and has the same meaning for all but two of 
them; the low-order four bits of the argument are not used by the interface 
functions; the next two bits determine the interface module minor device 
number; the next two bits determine the ICP minor device. Although dev is 
declared as an int, only the low-order eight bits are meaningful at this time. 
In calls to the vomtrace and vomsnap routines, dev need not be a minor 
device number since it is just saved as part of the event record. The 
definition of dev will not be repeated for each function. 


vpmcmd (dev, cmd) 
int dev; 
char *cmd: 


This function passes a command to the script. Cmd is the address of a 
four-byte array. The four bytes are passed to the VPM interpreter, which 
saves them until the protocol script executes a getcmd primitive. Only the 
most recent four bytes passed by a vomcmd call are saved by the VPM 
interpreter. | 


struct vpmbd *vpmdeq (cip) 
struct clist *clp: 


This function removes the buffer-descriptor pointer at the head of the queue | 


pointed to by c/p and returns it to the caller. If the queue is empty, a null 
pointer is returned. 


vpmemptq (dev, bdp) 
int dev; 
struct vpmdb *bdp; 


This function is used to pass an empty receive buffer for use by the 
interpreter in the ICP. Bdp is a pointer to a buffer descriptor or null. If bdp is 
not a null pointer, the buffer decriptor is appended to the empty-receive- 
buffer queue for the interface module specified by dev. If the VPM 
interpreter currently has room for another empty receive buffer, the buffer at 
the head of the queue is removed and passed to the ICP. The sum of the 
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number of buffers on the empty-receive buffer queue and the number of 
receive buffers the VPM interpreter has in its queues is returned to the 
caller. If bdp is a null pointer, the above sum is returned and nothing else is 
done. 


vpmengq (bdp, clip) 
struct vpmbd*bdp; 
Struct clist *clp; 


If bdp is a null pointer, the number of buffer-descriptor pointers on the clist 
queue pointed to by ci/p is returned. If bdp is a not a null pointer, the buffer 
descriptor pointed to by bdp is appended to the clist queue pointed to by 
cip and the number of pointers currently on that queue is passed as the 
return value. 


char *vpmerrs (dev, n) 
int dev, n; 


This function is used to read and reset error counters in the VPM interpreter. 
The function passes a GETECMD command to the VPM interpreter and 
blocks until the interpreter responds; this command causes the interpreter to 
copy its error counters to an array in the interface module and send a 
completion report to the driver. After the copy operation is completed, a 
pointer to the error-count array is passed to the caller as the return value. 
The second argument is not currently used. 


char *vpmrpt(dev) 
int dev; 


This function is used to receive a script report from the ICP. When the 
protocol script executes a rtnrpt primitive, four bytes of data are passed to 
the interface module. If a rtnrpt has been executed by the protocol script 
_ since the last call to vomrpt, a pointer to the four bytes passed by the most 
recent rtnrpt primitive is returned; otherwise zero is returned. 


vpmsave (type, dev, word], word2) 
char type, dev; 
short word, word2; 


This function creates an event record with the following structure: 


struct { | 
short c_sequn; /* Sequence number */ 
char c_type; i* Argument type */ 
char c_dev; /* Argument dev */ 
short c_word1; /* Argument word1 */ 
short c_word2; i“ Argument word2 */ 


} 
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This event record is passed to the trace driver using trsave. 


vpmsnap (type, dev, word1, word2) 
char type, dev; 
short word1, word2; 


This function is similar to vomsave. The only difference is that a time stamp 
(long s_Ibolt) is added to the event record after word2. A protocol script may 
generate a time-stamped event record by executing the snap primitive. | 


vpmstart (dev, type,rint) 
int dev, type; 
int (*rint)(); 


This function must be called on the first open of the pigiouk -module minor 
device associated with the interface-module minor device and ICP identified 
by dev. Type is a number that identifies the program running in the ICP and 
must agree with the value specified when the ICP load module was loaded 
into the ICP. For VPM interpreters, type is conventionally 6. Rint is the 
name of a protocol-module routine to be called by the interface module | 
when it needs to return a transmit buffer, a receive buffer, a script report, or 
an error-termination code. See the description of vomtrint in appendix 2 for 
an example of such a routine. Vomstart sends a RUN command to the VPM 
interpreter which causes it to begin execution of the protocol script. If the 
interface module identified by dev is not configured, ENXIO is returned. If 
the module is already running, i.e., vomstart has been called and fpmstop 
has not been called, or if the ICP is not running or was loaded using a 
different magic number, EACCESS is returned. A return value of zero 
indicates a normal completion. 


vpmstop (dev) 
int dev; 


This routine is called to halt the execution of the protocol script by the 
interpreter. The routine waits until the last transmit buffer has been returned 
by the protocol script (or 5 seconds have elapsed), then sends a HALT 
command to the VPM interpreter, which causes the interpreter to stop 
executing the protocol script. When the interpreter acknowledges the HALT 
command (or 5 seconds), any transmit or receive buffers still enqueued on 
the interface module’s transmit-and-empty-buffer queues are returned to the 
protocol module. This does not include buffers contained in the interpreter’s 
queues. Generally, when the protocol script is halted normally, the 
interpreter will have one or more empty receive buffers. If the interpreter or 
protocol script terminates in error, some transmit buffers may also remain 
unaccounted for. The means the protocol module must keep a record of all 
buffers in use for each particular minor device, so that these buffers can be 
returned to the pool of available buffers when that minor device is closed. 
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8.4 Appendix 2 - The VPM Protocol Module 


This appendix gives a detailed description of the functions that make up the 
standard VPM protocol module. The description may be useful as a guide in 
writing other VPM protocol modules. The dev argument to the following 
routines is declared as an int; however, only the low-order eight bits are 
meaningful at this time. The low-order four bits are used to determine the 
minor device of the protocol module; the next two bits determine the minor 
device of the interface module; the next two bits determine the ICP minor 
device. 


vpmopen (dev, flag) 
int dev, flag; 


This function opens the protocol-module minor device specified by the low- 
order four bits of dev. Flag contains the option bits specified on the open 
system call. Exclusive or non-exclusive opens are permitted. If the driver is 
opened for both reading-and-writing, the open is exclusive, i.e., no further 
opens are permitted. If the driver is opened for both reading only or for 
writing only, the open is non-exclusive and subsequent opens for reading 
only or writing only are permitted. If this device is not open when this 
function is called, it obtains a number of non-addressable system buffers to 
be used as receive buffers and passes them to the VPM interpreter using 
the interface routine vomemptq. Vomopen also calls the interface routine 
vomstart if the minor device was not already open. 


vpmclose (dev) 
int dev; 


This function closes the minor device specified by the low-order four bits of 
dev. It calls the interface routine vomstop, flushes the receive queue for the 
specified minor device, releases its buffers, and reinitializes its data 
structure. 


vpmwrite (dev) 
int dev; 


This function implements the write system call. If the transmit queue is not 
full, the function obtains a non-addressable system buffer, copies up to 512 
bytes of the user's write data into it, and enqueues the buffer on the level 2 
transmit queue using the interface function vomxmtq. These steps are 
repeated until all of the user's write data has been copied. If the transmit 
queue is full when this function is called or if it becomes full while the 
function is executing, the calling process is blocked until there is room in the 
queue for more transmit buffers. 


vpmread (dev) 
int dev; 


Sys5 UNIX 8-15 


CHAPTER 8 VIRTUAL PROTOCOL MACHINE 


This function implements the read system call. When it is called, the calling 
process is blocked until the receive queue is non-empty. As data is received 
by the VPM interpreter, it is placed into an empty receive buffer. When the 
protocol script decides that the data contained in a particular buffer is valid, 
it executes a rtnbuf (BISNYC) or rtnfrm (HDLC) primitive, which causes the 
buffer descriptor pointer to be passed to the interface module’s interrupt 
routine. The interface module then passes the buffer descriptor pointer to 
the protocol module by calling the protocol module's interrupt routine. The 
protocol module enqueues the buffer descriptor pointer on the receive queue 
and wakes up (unblocks) the reader(s). The number of bytes requested, or 
the data in one buffer, whichever is less, is copied to the user process; the 
number of bytes copied is passed as the return value. Any bytes remaining 
in a buffer are used to satisfy subsequent read requests. 

vpmioctl (dev, cmd, arg, mode) 

int dev, cmd, mode; 

char *arg; 


This function implements the oct! system call. Cmd determines the function 
to be performed as follows: 


VPMCMD - Pass a command to the protocol script. The first four 
bytes of the array pointed to by arg are passed to the VPM 
interpreter which saves them and passes them to the protocol 
script the next time it executes a getcmd primitive. 


VPMERRS - Get and reset the VPM interpreter’s error counters. 
The eight-byte array containing the VPM _ interpreter’s error 
counters is copied to the user array pointed to by arg. The 
interpreter’s copy of the error counters is then set to zero. 


VPMRPT - Get a report from the protocol script. If the protocol 
script has executed a rtnrpt primitive since the last time this joct! 
command was issued, the script report (four bytes) is copied to the 
user array pointed to by arg and one is passed as the return value; 
otherwise, zero is passed as the returned value. 


The inode argument is not used. The values for VPMCMD, VPMERRS, and 
VPMRPT are defined in file /usr/include/sys/vom.h. 


vpmtrint (dev, code, bdp) 
int dev, code; 
struct vpmbd *bdp; 


The address of this function is passed to the protocol module using the 
vomstart function described in Appendix 1. This routine is called from the 
interface module to return transmit buffers, receive buffers, script reports, or 
error termination codes. It is usually called at interrupt priority and therfore 
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( ~ must not sleep or do unnecessary work. Code identifies the purpose of the 
call and determines the meaning of bdp as follows: 


RRTNXBUF - Bdp is a pointer to the buffer descriptor for a 
transmit buffer. This call is made when the protocol script executes 
a rtnxbuf (BISYNC) or a rtnxfrm (HDLC). 


RRTNRBUF - Bdp is a pointer to the buffer descriptor for a receive 
buffer. This call is made when the protocol script executes a rtnbuf 
(BISYNC) or a rtnfrm (HDLC). 


RRTNEBUF - Bof is a pointer to the buffer descriptor for an empty 
receive buffer. This call is used to return empty receive buffers 
when the interface module is stopped by calling vomstop. 


ERRTERM - Bdp is the error-termination code passed to the 
interface module by the VPM interpreter when it halts the protocol 
script because of an error condition. The meaning of these error 
codes is given in the attached manual entry for vmp(4). 


The values for RRTNXBUF, RRTNRBUF, RRTNEBUF, and ERRITERM are 
defined in the /usr/include/sys/vpm.h. 
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8.5 Appendix 3 - The Trace Driver 


The trace driver provides a means by which a user program can receive 
trace information generated by the VPM driver, a protocol script, or some 
other driver. See the attached manual entry for trace(4). 


A description of each routine of the trace driver follows. 


tropen (dev) 
int dev; 


This function opens the minor device specified by dev exclusively. 


trclose (dev) 
int dev; 


This function closes the minor device specified by dev. It discards any data 
on the read queue and initializes the data structure associated with the 
minor device. | 


trread (dev) 
int dev; 


This function implements the read system call; it sleeps until at least until at 
least one event record is available on the read queue associated with dev. It 
then copies records to the user until the user’s read count is less than the 
number of bytes in the next event record or until the read queue is empty. 
The number of bytes copied is passed as the return value. 


trioctl (dev, cmd, arg, mode) 
int dev, cmd, arg, mode; 


This function implements the joct/.system call. Cmd indicates the operation 
to be performed. The driver has one command: 


VPMTRO - Enable a trace channel. In order for data to be saved 
on the read queue for minor device dev, the device must be open 
and the channel to which it is written must be enabled. This 
command enables channel arg, which must be in the range 0 to I5. 
Any combination of channels may be enabled by repeatedly calling 
this function with different values of arg. All channels are disabled 
when the minor device is closed. 


trsave (dev, chno, buf, ct) 
char dev, chno, *buf, ct; 


lf minor device dev of the trace driver is open and channel chno of that 
minor device is enabled then chno and ct, followed by cf bytes starting at 
address buf, are copied onto the read queue associated with dev, provided 


the read queue for that device has room for the complete event record. If 


not, the record is discarded. 
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8.6 Appendix 4 - The VPM Event Trace 


Calls to the interface routine vomsave have been placed strategically 
throughout the standard VPM protocol module (vpmt.c) and the VPM 
interface module (vpmb.c) to provide an event trace for debugging new 
protocol modules and/or protocol scripts. A protocol script may generate an 
event record by executing a trace primitive. All such event records are 
discarded unless some user program has opened minor device 0 of the 
trace driver and enabled channel 0 of that minor device. The command 
vpmtrace(1C) opens this device and enables channel 0, then reads event 
records and prints them on the standard output as they are received. Each 
kind of event record that is generated by the VPM driver will be described by 
giving the vomsave function call as it appears in vomt.c or vomb.c, followed 
by an example of the line printed by vpmtrace as a result of this call. 
Following this, the context of the vomsave call and the definition of the 
parameters passed will be given. The definition of a parameter that appears 
in more than one call will not be repeated. The first five calls to vomsave 
occur in the source file vomt.c; the remaining calls occur in vomb.c. 


vpmsave(‘p’, dev, ec, 0) 
243 p l00 15 0 


Called if vomstart returns an error code. The first field of the printed record 
contains a sequence number assigned by vomsave. The remaining four 
fields contain the four remaining arguments to vpmsave in the same order 
as they appear in the call to vomsave. The first argument to vpmsave, in 
this case a ‘p’, identifies the record type. Dev is the minor device number as 
defined earlier; ec is the value returned by vomstart. 


vpmsave(‘0’, dev, vp->vt_state, 0) 


244 0 100 10 


Called just before the normal return point of vomopen. The variable, vp- 
>vt_state, contains the state bits for the protocol module. Refer to the 
source file, vornt.c, for the definitions of the state bits. 


vpmsave (‘c’, dev, vp->vt_state, 0) 


245 c 100 13 0 
Called from vpmclose just before the state bits are initialized. 


vpmsave (‘w’, dev, ct, dp) 


246 w |00 1000 
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Called just before putting a buffer-descriptor pointer on the transmit queue in 
vomwrite. Ct is the number of bytes in the buffer. When executing on a 
PDP11, dp is the pointer to the buffer descriptor; dp is not meaningful when 
executing on a VAX because pointers are four bytes on a VAX and the 
argument corresponding to dp is declared as a short. 


vpmsave (‘r’, dev, ent, dp->d_bos) 


247 r 100 500 500 


Called from vomread just after cnt bytes have been moved to the user’s 
read buffer. The parameter dp->d_bos is the number of bytes remaining in 
the current receive buffer. 

vpmsave (‘s’, dev, vp->vbstate, 0) 

248 s 100 4010 


Called just before the normal return from vpmstart. The parameter vp- 
>vb_state contains the state bits for the interface module. For the 
definitions of the state bits, refer to the source file vomb.c. | 


vpmsave (‘t’, dev, vp->vb_state, vp->vb_xbkmc) 


249 t |00 0 0 


Called just before the normal return from vomstop. The parameter vp- 
>vb_xbkmc is the number of transmit buffers currently held by the VPM 
interpreter. It can be non-zero if the protocol script or interpreter terminates 
in error. 


vpmsave (‘X’, dev, vp->vb_xbkmc, 0) 


250 X 10010 


Called from vpmbrint. the interface module's receive-interrupt routine, each 
time the VPM interpreter returns a transmit buffer. 


vpmsave (‘R’, dev, vp->vb_vrkmc, 0) 


251 R 100 10 


Called from vombrint each time the VPM interpreter returns a receive buffer. 
The parameter vp->vb_rbkmc contains the number of receive buffers 
currently held by the interpreter. 


vpmsave (‘T’, dev, sel4, sel6) 
252 T 100 370 21 34 
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Called from vombrint when a trace report is received from the interpreter. 
This occurs when the protocol script executes a trace primitive. Se/4 
contains the value of the script location counter (plus two) at the time the 
trace primitive was executed. By referring to the assembly-language listing 
of the protocol script generated by the -/ option of vomc, the point in the 
protocol script at which the trace was executed can be determined. The 
value of the location counter is two greater than the location of the trace 
instruction as shown in the assembley-language listing. Se/6 contains the 
byte or bytes passed by the trace primitive. Vomtrace prints these two bytes 
in separate fields. 


vpmsave (‘E’, dev, sel4, sel6) 
253 E 244 21 


Called from vombrint when an error-termination report is received from the 
interpreter. Se/4 contains the script location counter at the time execution of 
the script was terminated. Se/6 contains the termination code. For an 
explanation of these codes see the attached manual entry for vpm(4). 


vpmsave (‘P’, dev, sel4, sel6) 
254 P 100 2105 1055 


Called from vombrint when a script report is received from the interpreter. 
This occurs when the protocol script executes a rtnrpt primitive. Sel4 and 
se/6 contain the four bytes transferred by this primitive. 


vpmsave (‘F’, dev, sel4, sel6) 
255 F 100 3 0 


Called form vombrint when an error-count report is received from the 
interpreter. Se/4 and se/6 do not contain any meaningful data for this event 


type. 
vpmsave (‘S’, dev, sel4, sel6) 
256 S 100 401 0 


Called from vombrint when a start-up report is received from the interpreter. 
The low-order eight bits of se/4 contain a parameter defining the maximum 
number of transmit buffers the interpreter can accept; the high-order eight 
bits contain a parameter defining the maximum number of receive buffers. 
Se/6 contains the options supported by the interpreter. 


vpmsave (‘C’, dev, vp-vb_state, bp->xbkmc) 

257 C 10010 

Called from vpmclean just before the data structure associated with dev is 
initialized. 
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9. UNIX SYSTEM REMOTE JOB ENTRY 


This chapter contains information on the design and operation of the Sys5 
UNIX Remote Job Entry (RJE). In this document, RJE refers to the facilities 
provided by UNIX operating system and not to the remote job entry feature 
of the HASP and JES2 subsystems produced by International Business 
Machines (IBM). 


The information contained in this chapter should be used to augment the 
information contained in the Sys5 UNIX Administrator Reference Manual 
[rje(8)]. There will be assumptions made concerning allocation § of 
responsibilities between UNIX system and IBM operations, hardware 
configuration, etc. Although these assumptions may not fully apply to your 
location, they should not interfere with the intent of this document. 


The major topics discussed in this document are as follows: 


e SETTING UP—Hardware requirements and RJE generation on the 
IBM and UNIX systems. 


e DIRECTORY STRUCTURES—The controlling RJE directory structure 
and a typical RJE subsystem directory structure. 


e RJE PROGRAMS—Programs that make up an RJE subsystem. 
e UTILITY PROGRAMS—Programs available for debugging or tracing. 


e RJE ACCOUNTING—The accounting of jobs done by RJE and some 
methods for using this accounting data. 


e TROUBLESHOOTING—Error recovery and procedures for identifying 
and fixing RUE problems. 


9.1 Facilities 


Discussions will focus on a hypothetical RJE connection between a UNIX 
system, whose nodename is pwhba, and an IBM 370/168, referred to as B. 
We also assume that pwba is connected to an IBM 370/158, referred to as 
C. The UNIX operating system machine emulates an IBM System/360 
remote multileaving work station. 


9.2 Setting Up 
9.2.1 Hardware 


In the remainder of this guide, the hardware described below will be referred 
to as the physical device; and its name will be referred to as device”, 
where ? is the device number. 


On DEC computers, RJE requires the use of a KMC11-B microprocessor to 
control either a single-line interface or eight-line interface. For KMC11-B 
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control of a single RJE line, the following hardware is required: 
e KMC11-B Microprocessor—used to drive the RJE line. 


e DMC11-DA or DMC11-FA line unit—the DMC11-DA interfaces with 
Bell 208 and 209 synchronous modems or equivalent. The DMC11- 
FA interfaces with Bell 500 A LI/5 synchronous modems or 
equivalent. 

Each KMC/DMC pair supports a single RJE line that may operate at speeds 

up to 56 KB. On the KMC11 line unit, the NO CRC switch (switch S2 in 

switch pack number 1) should be in the ON position. 


For KMC11-B control of from one to eight RJE lines, the following hardware 
is required: 


e KMC11-B Microprocessor—used to drive the RJE line. 
e DMS11-DA Eight-line synchronous communication multiplexor. 
e DM11-BA Modem control multiplexor. 


These three devices are collectively known as a KMS11. A KMS11 
supports up to eight low-speed (9.6KB or lower) RJE lines or up to four 
intermediate-speed (19.2KB or lower) lines. 


lf a KMC/DMC pair is used for RJE, then the KMC11-B must be configured 


on the host system. If a KMS11 is used, both the KMC11-B and the DM11- 


BA must be configured on the host system. The use of a KMS11 requires 
that the dmkset(1M) be invoked (typically in /etc/brc) before loading the RJE 
protocol script into the KMC11-B. 


9.2.2 IBM Generation 


The following applies to the host IBM system. The remote line to the UNIX 
operating system machine should be described as a System/360 remote 
work station. The following parameters must be initialized and must agree 
with their counterparts on the UNIX operating system machine: _ 


e Number of printers (NUMPR)—The number of logical printers (up to 
seven). | 


e Number of punches (NUMPU)—The number of logical punches (up to 
seven) 


e Number of readers (NUMRD)—The number of logical readers (up to 
seven). | 


—_ 
ee / of 


The JES2 parameters for the hypothetical connection to IBM system B are 


as follows: 
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RMT5 $/360, LINE =5,CONSOLE,MULTI, TRANSP, BUME 5, 
NUMPU = 1,NUMRD=5,ROUTECDE=5 
R5.PR1 PRWIDTH=132 

R5.PR2 PRWIDTH= 132 

R5.PR3 PRWIDTH = 132 

R5.PR4 PRWIDTH=132 

R5.PR5 PRWIDTH = 132 

R5.PU1 NOSUSPND 

R5.RD1 PRIOINC=0,PRIOLIM = 14 
R5.RD2 PRIOINC=0,PRIOLIM= 14 
R5.RD3 PRIOINC=0,PRIOLIM = 14 
R5.RD4 PRIOINC=0,PRIOLIM=14 
R5.RDS5 PRIOINC =0,PRIOLIM=14 


System pwba is referenced by line 5 (LINE=5), remote 5 (RMT5). It is 
defined as having a console for the rjestat(1C) command, five printers, one 
punch, and five readers. Although you may have up to seven printers or 
punches, the total number of printers and punches may not exceed eight. 


The line is described as a transparent (TRANSP), multileaving (MULT]) line. 


The remaining information describes attributes of the printers, punches, and 
readers. 


Normally, separator pages are transmitted with IBM print files. The UNIX 
system RJE does not remove separator pages. To prevent transmission of 
separator pages on printer 1 of the previous example, its attributes would 
be: 


R5.PR1 PRWIDTH=132,NOSEP 


NOSEP should be included for all printers when separator pages are not 
desired. Most IBM systems can also be told via a console command to 
cancel transmission of separator pages on printers. This can be done from 
the IBM system console or from the remote UNIX operating system machine 
via rjestat. For example, the following JES2 command would cancel 
separator page transmission on printer 1: | 


$TR5.PR1,S=N 
9.2.3 Sys5 UNIX Generation 


If the RJE remote dialing facility is to be used, the administrator must make 
sure that the definition for the RJECU in the file /usr/include/rje.h is the 
device to be used for remote dialing. By convention, RJECU is defined to 
be /dev/dn2 for DEC processors. To compile and install RJE, the normal 
make(1) procedures are used (see the "Setting up the Sys5 UNIX" chapter 
of this guide). Once an RJE subsystem has been installed, the remote line 
must be described in the configuration file /usr/rje/lines. This file as it exists 
on the hypothetical system pwba is as follows: 
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B pwba /ust/rje1 rjel vpm0 5:5:1 1200:512:y 
C pwha /usr/rje2 rie2 vom1 1:1:1 1200:512 | 


The /usr/rje/lines is accessed by all components of RJE. Each line of the 
table (maximum of eight) defines an RJE connection. Its seven columns 
may be labeled host, system, directory, prefix, device, peripherals, and 
parameters. These columns are described as follows: 


9-4 


host—The IBM system name, e.g., A, B, C. This string can be up to | 


six characters long. 
system—The UNIX system nodename [see uname(1)]. 


directory—The directory name of the servicing RJE subsystem (e.g., 
/usr/rje2). 


prefix—The string prepended to most files and programs in the 
directory (i.e., rje2). 


device—The name of the controlling virtual protocol machine (VPM) 
device, with /dev/ excised. In order to specify a VPM device, all VPM 
software must be installed, and the proper special files must be made 
[see vpm(7) and mknod(1M)]. Also, the permission modes of the 


VPM device must be set by the system administrator to allow read 


and write access by the RJE software. 


peripherals—Information on the logical devices (readers, printers, 
punches) used by RJE. There are three subfields. Each subfield is 
separated by “:” and is described as follows: 


1. Number of logical readers. 
2. Number of logical printers. 


3. Number of logical punches. 
The number of peripherals specified for an RJE subsystem must 
agree with the number of peripherals that have been described on the 
remote machine for that line. 


parameters—This field contains information on the type of connection 
to make. Each subfield is separated by “:”. Any or all fields may be 
omitted; however, the fields are positional. All but trailing delimiters 
must be present. For example, in: 

1200:512:::9-555-1212:400 
subfields 3 and 4 are missing. Each subfield is defined as follows: 

1. space—This subfield specifies the amount of space (S) in 


blocks that RJE tries to maintain on file systems it touches. 
The default is 0 blocks. Send(1C) will not submit jobs and 
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rjeinit issues a warning when less than 1.5S blocks are 
available; rjerecv stops accepting output from the host when 
the capacity falls to S blocks; RJE becomes dormant until 
conditions improve. If the space on the file system specified 
by the user on the “usr=" card would be depleted to a point 
below S, the file will be put in the job subdirectory of the 
connection’s home directory rather than in the place that the 
user requested. 


2. size—This subfield specifies the size in blocks of the largest 
file that can be accepted from the host without truncation 
taking place. The default is no truncation. Note the UNIX 
system has a default 1 megabyte file size limit. 


3. badjobs—This subfield specifies what to do with undeliverable 
returning jobs. If an output file is undeliverable for any reason 
other than file system space limitations (e.g., missing or invalid 
“usr=" card) and this subfield contains the letter y, the output 
will be retained in the job subdirectory of the home directory; 
and login rje is notified via mail(1). If this subfield has any 
other value, undeliverable output will be discarded. The 
default is “n”. | 


4. console—This subfield specifies the status of the interactive 
status terminal for this line. If the subfield contains an i, the 
status console facilities of rjestat will be inhibited. In all cases, 
the normal noninteractive uses of rjestat will continue to 
function. The default is “y”. 


5. dial-up—This subfield contains a telephone number to be used 
to call a host machine. The telephone number may contain 
the digits 0 through 9 and the character “—’, which denotes a 
pause. If the telephone number is not present, no dialing is 
attempted; and a leased line is assumed. 


6. transmission block size—This subfield specifies the size (in 
bytes) of transmission blocks to be sent to the IBM host for a 
particular RJE subsystem. The maximum permitted block size 
is 512. The default value is also 512. 


When multiple readers have been specified, jobs that are submitted for 
transmission to IBM are assigned to the reader with the fewest cards on it. 
Each reader gets an equal amount of service. This prevents smaller jobs 
from having to wait for a previously submitted large job to be transmitted. 
When multiple printers or punches have been specified, returning jobs get 
assigned to free printers (or punches) allowing smaller output files to bypass 
large output files. 
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Deciding how many peripherals to specify depends on the use of that RJE 
subsystem. If an RJE subsystem is heavily used for off-line printing (i.e., 
output does not return to the UNIX operating system machine), the 
administrator would want to specify multiple readers but would not have a 
need for multiple printers or punches. 


9.3 Directory Structures 
9.3.1 Controlling Directory 


The controlling directory used by RJE is /usr/rje. This directory contains 
RJE programs for use by separate RJE subsystems (e.g., rje7, rje2, rje3 ) 
and the shell queuer’s directory. Most RJE programs existing here have 
been compiled such that each RJE subsystem shares the text of these 
programs. A snapshot of this directory on our hypothetical machine is as 
follows: 


“rwxr-xr-x  3rije rje 4068 Mar 4 10:42 cvt 
-rw-r--r-- 1 rje rie 42 Apr 10 09:52 lines 
“rwxr-xr-x 3rje rje 15096 Apr 10 13:01 rjedisp 
“rwxr-xr-x 3rije rje 2328 Mar 4 10:21 rjehalt 
“rwxr-xr-x 3rije rie 10396 Apr 15 10:07 rjeinit 
“f-X------ 3 rje rie 785 Apr 8 09:00 rjeload 
“rwsr-xr-x 3rije rje 5040 Mar 27 09:28 rjeqer 
-rwxr-xr-x 3rje rie 4072 Apr 1 15:40 rjerecv 
“rwxr-xr-x 3rje rje 3888 Mar 27 09:35 rjexmit 
-rwsr-xr-x root rje 2696 Mar 27 14:42 shger 
“rwxr-xr-x 3rije rje 5920 Apr 2 15:47 snoop 
drwxr-xr-x 2rje rje 80 Mar 25 13:26 sque 


The RJE subsystems are generated in their own directory by linking the 
program names in this directory to the appropriate names in the subsystem 
directory. The programs are described in the part “RJE PROGRAMS”. The 
file lines is the configuration file used by all RJE subsystems. The directory 
Sque is used by the shell queuer (shqer). This directory contains: 


-rw-r--r-- 1rje rje 0 Feb 14 14:04 errors 
-rw-r--r-- irje rje OFeb 14 14:04 log 


When shger has work to do, the files fog and errors will be of nonzero 
length; and temporary files (tmp) will also appear here. 
9.3.2 Subsystem Directory 


The RJE subsystem described in this part maintains the connection between 
pwba and IBM 8B and will be referred to as rje?. The first line of 
/usrirje/lines describes rje?. As noted in this file, rje? runs in the directory 
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( /usri/rje1. A snapshot of this directory is as follows: 
-rW-r--r-- 1rje rje 4990 Apr 15 08:30 acctlog 
“rWXxir-xr-X 3rje rje 4068 Mar 4 10:42 cvt 
“rW-r--r-- 1rje rje O Apr 15 04:02 errlog 
drwxrwxrwx 2rje rje 192 Apr 10 09:51 job 
-rw-r--r-- 1rje rje 194 Apr 15 08:11 joblog 
-rw-r--r-- 1rje rje O Apr 15 08:11 resp 


-rwxr-xr-X 3rje rje 15096 Apr 10 13:01 rjeidisp 
-PWXIr-xr-X 3rje rje 2328 Mar 4 10:21 rjethalt 
-rWXr-Xxr-X 3rje rje 10396 Apr 15 10:07 rjetinit 
“f-X------ 3rje rje 785 Apr 8 09:00 rjelload 
-rwsr-xr-X 3rje rje 5040 Mar 27 09:28 rjelqer 
-rWXIr-xr-X 3rje rje 4072 Apr 1 15:40 rjetrecv 
“FWXI-XIr-X 3rje rje 3888 Mar 27 09:35 rjelxmit 
drwxr-xr-x  2rje  rje 144 Apr 15 08:30 rpool 
“f--f--f-- 1rje rje 14Mar 4 10:21 signon 
“PWXP-XIr-X 3rje rje 5920 Apr 2 15:47 snoop0 
drwxrwxrwx 2rje  rje 176 Apr 10 13:03 spool 
drwxr-xr-x 2rje rje 224 Apr 10 13:56 squeue 
-fW-r--r-- 1rje rje OO Apr 15 10:30 stop 

( -rW-r--r-- 1rje rje 274 Mar 7 20:25 testjob 


The programs rje1*, cvt, and snoop0 are linked to the corresponding 
programs in /usr/rje. The remaining files and their uses are as follows: 


acctlog—Accounting data is stored in this file if it exists. This file is 
the responsibility of the RJE administrator. 


errlog—Used by rje7 to log errors. It can be useful for debugging 
rje1 problems. 


joblog—Used by rje1qer and rjestat to notify rjeixmit that a job (or 
console request) has been submitted. It also contains the process- 
group number of the rje7 processes. The program cvt can be used 
to convert this file to a readable form. 


resp—Contains console messages received from IBM B. These 
messages can be responses for rjestat or IBM responses to 
submitted jobs (i.e., on reader messages). This file is truncated if it 
grows to a size greater than 70,000 bytes. 


signon—A file that must be created by the system administrator and 
that should contain a character sequence of the form: 


/*SIGNON XXXXX 
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The X’s should be replaced by the signon identification string 
(obtainable from the IBM host’s system administrator) that identifies 
this RJE subsystem to the IBM host system. 


e stop—Iindicates that rjelhalt has been executed. The existence of 
this file indicates to rjestat that rje7 has been halted by the operator. 


e festiob—A sample job that can be submitted to test the rje7 
subsystem. Originally, the job control statements may have to be 
changed to suit your IBM system. 


When rje7? terminates abnormally, the file dead should appear in this 
directory. This file contains a short message indicating why rje7 is not 
operating and is used by rjestat to report the problem. The remaining 
directories and their uses are as follows: 


e job—Used to save undeliverable jobs if the proper parameter has 
been specified in /usr/rje/lines. The sample job described above is 
also delivered to this directory. This directory should be mode 777. 


rpoo/—Contains temporary files used to gather output from the 
remote machine. These files are named pr* (for print output files) 
and pu* (for punch output files). Once a complete file has been 
received, the file is dispatched in the proper way by rjetdisp. 


spoo/—Used by send to store temporary files to be submitted to the 
remote machine. This directory must be mode 777. 


Squeue—Used by rje? to store submitted files until they are 
transmitted. The program rjeiqer is used by send to move the 
temporary files in the spoo/ directory to this directory. 


9.4 RJE Programs 


All programs described below, with the exception of rjestat, exist in /usr/rje. 
These programs are “shared text” and are linked (except shqer) to the 
proper names in each subsystem directory. The names described below 
are generic; the programs in the rje2 directory would be rje2qer, rje2init, 
etc. 


Each available RJE subsystem occupies three process slots. The slots 
used are rje?xmit for the transmitter, rje?recv for the receiver, and 
rje?disp for the dispatcher. One additional process slot is used for shqer 
regardless of how many subsystems are available. 


Each RJE subsystem tries to be self-sustaining and logs any errors 
encountered during normal operation in its erriog file. 
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9.4.1 Rjeqer 


This program is used by send to queue files for transmission. When 
invoked, it performs the following steps: 


1. Moves temporary pnch(4) format file in spoo/ directory to squeue 
directory. 


2. Writes an entry at end of file joblog containing: 
e name of file to be transmitted 
e submitter’s user ID 
e number of card images in file 


e message level for this job. 


The file joblog is used to notify rjexmit of work to be done. 
3. Notifies user that file has been queued. 


Send determines the host system desired and invokes the proper rje?qer 
by getting the prefix from the lines file (e.g., if sending to IBM C from our 
machine, rje2qer would be invoked). 


9.4.2 Rjeload 


This program is used to start an RJE subsystem. Its prefix determines the 
subsystem to start (e.g., rje2load starts rje2). The following paragraphs 
explain which commands should be executed in /etc/rc when changing to 
init state 2 (multiuser). 


Rjeload requires only one argument (device specification) if an RJE 
subsystem utilizes a KMC/DMC hardware configuration. Rjeload requires a 
second argument, and optionally a third argument, if an RJE subsystem 
utilizes a KMS11. The second argument (line number specification) must be 
supplied to indicate which of the eight DMS11 line interfaces is to be used 
by the RJE subsystem for communication with the. IBM host. Valid line 
numbers are O through 7. The third argument (’downloadkms’) must be 
supplied when rjeload is used to start the first RUE subsystem that utilizes a 
particular KMS11 (this is because a single down-load of the RJE protocol 
script permits the operation of all eight lines controlled by the KMS11). Ifa 
hypothetical DEC machine has four RJE subsystems, the first using a 
KMC/DMC and the remaining three using a KMS11, then the following 
commands would be used: 
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rm —-f /ust/rje/sque/log 

su rje —c "/usr/rje1/rje1load deviceO" 

su rje —Cc "/ust/rje2/rje2load device1 0 downloadkms" 
su rje —c "/usr/rje3/rje3load device1 1" 

su rje —c "/usr/rje4/rje4load device1 2" 


The file /usr/rje/sque/log is removed to ensure the correct operation of 
shger. When invoked, rjeload performs the following steps: 


1. Uses VPM device from /usr/rje/lines to link the proper devices [see 
vpmset(1C)]. 


2. Loads device given as argument with RJE protocol script. 


Executes rje?init to start rje? processes (e.g., rje2load executes 
rje2init). 


9.4.3 Riehalt 


This program is used to halt an RJE subsystem. To halt rje2 on UNIX 
operating system machines, /usr/rje2/rje2halt is executed. This should be 
done in the shutdown procedure for your machine to ensure graceful 
termination of RJE. Rjehalt will allow only those users with permission to 
halt an RJE subsystem. Rjehalt uses the header on the file joblog to get 
the process-group of the RJE subsystem processes. This group is signaled 
to terminate. When all processes have terminated, rjehalt sends a “signoff” 
record to the host machine. This signoff record is taken from the file signoff 
(ASCII text) if it exists; otherwise, a “/*signoff” record is sent. On 
completion, rjehalt creates the file stop in the subsystem directory. The 
presence of the file stop in a subsystem directory causes rjestat to report to 
users that RJE to the corresponding host has been stopped by the operator. 


9.4.4 Rijeinit 


This program initializes an RJE subsystem. It is used by rjeload and can 
be used to restart a subsystem if the VPM script has previously been 
started. Rjeinit should only be executed by user rje. Rijeinit fails if there 
are less than 100 blocks or 10 inodes free in the file system. It issues a 
warning if there are less than 1.5X blocks (where X is the first field in the 
parameters for that line) or 100 inodes free in the file system. If rjeinit fails, 
the reason for the failure is reported; and the file dead is created containing 
“Init failed’. This will be reported by rjestat until a subsequent rjeinit 
succeeds. The rjeinit performs the following functions: 


1. Dials a remote host if specified. 
2. Truncates console response file resp. 


3. Sends a signon record to the host. The signon record is taken from 
file signon (ASCII text) if it exists; otherwise, rjeinit sends a biank 
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record as a signon. 

Sets up pipes for process communication. 

Resets process-group for RJE subsystem and restarts error logging. 
Rebuilds joblog file from jobs queued for transmission. 


Notifies rjedisp (via a pipe) of any returned files still remaining in rpool 
directory. 


Starts appropriate background processes rje?xmit, rje?recv, and 
rje?disp. 


Reports started or not started. 


lf failure occurs in a background process, it is reported by that process 
(error logging). The failing process will normally attempt to reboot the 
subsystem by executing rje?init with a + as its argument. When rijeinit is 
executed with + as its argument, this indicates an attempted reboot; and 
rjeinit will behave differently. 


9.4.5 Rjexmit 


This program writes data to the VPM device. The rjexmit process is started 
by rjeinit and runs in the background. When running, rjexmit performs the 
following processes: 


1; 


Checks joblog file for files to be transmitted. This is done every 5 
seconds when not transmitting data. When transmitting data, the 
jobiog is checked after transmitting one block from each active reader 
and console. 


(Reader refers to the logical readers used by RJE. Console refers to 
the RJE logical console which is separate from the logical readers.) 


Queues files from joblog according to first two characters of the file 
name: 


e rd*—These files are queued on the reader with the fewest 
cards. Normal use of the send command creates these files. 


e sq*x—These files are queued on the last available reader to 
assure sequential transmission. Using the —x option to the 
send command creates these files. 


e cox—These files are queued on the console. The rjestat 
command creates these files. 


All files described above contain expanded binary coded decimal 
interface code (EBCDIC) data. 
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3. Sends information to rjedisp (via a pipe) for use in user notification of 
job status. 


4. Builds blocks for transmission from active readers and the console. 
These blocks are built according to the multileaving protocol. 


5. Performs following peripheral control: 


e Sends requests to open readers when jobs have been assigned 
to them. These readers are not active until a grant is received 
from rjerecv (via a pipe). 


e Halts and activates readers when waits or starts (respectively) 
are received from rjerecv. 


e Sends printer or punch grants when an open request is 
received from rjerecv. 


6. Notifies rjedisp that a file has been transmitted and unlinks the file. 


If rjexmit encounters fatal errors, it creates the dead file with an appropriate 
message and signals the other background processes to exit. If possible, 
rjexmit will attempt to reboot the RJE subsystem by executing rjeinit. 


9.4.6 Rjerecv 


This program reads data from the VPM device. The rjerecv is started by 
rjeinit and runs in the background. When running, rjerecv performs the 
following processes: 


1. Reads blocks of data received from host system. 


2. Handles data received according to its type. The two types of data 
are: 


e Control information—rjerecv performs the following peripheral 
device control: 


a. Notifies rjexmit of grants to its requests to open readers. 
b. Passes wait and start reader information to rjexmit. 


c. Passes open requests (for printers and punches) from the 
host to rjexmit. 


e User Information—The three major types of user information 
received are: 


a. Console responses and job status messages. This data is 
appended to the resp file for use by rjestat and rjedisp. 


b. The printer output from user jobs. This data is collected in 
temporary files (pr*) in the rpoo/ directory. When a 
complete print job has been received, rjerecv notifies 
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rjedisp (via a pipe) that the file is to be dispatched. 

c. The punch output from user jobs. This data is handled the 
same as printer output except that the rpool files are named 
pu*. 

If console response file resp exceeds 70,000 characters, rjerecv 
truncates file. 


Rjerecv stops accepting output from the remote machine if the 
number of free blocks in the file system falls below space blocks. 


Rjerecv truncates files to size blocks if a received file exceeds this 
value. 


If rjerecv encounters fatal errors, it creates the dead file with an appropriate 
error message, signals the other background processes to exit, and reboots 
the RJE subsystem. 


9.4.7 Rjedisp 


This program dispatches user information. Rjedisp is started by rjeinit and 
runs in the background. When running, rjedisp performs the following 
processing: 


1. 


Dispatches output. The two types of output are printer and punch 
Output. After receiving notification of output ready from rjerecv, 
rjedisp searches for a “usr="’ line in the received file. The format of a 
“usr=" line is as follows: 


usr = (user, place, level) 
Rjedisp dispatches output according to the place field. 
Dispatches messages. The two types of messages are: 


e Job transmitted—This message is sent to the submitting user 
when rjedisp reads this event notice from the rjexmit pipe. 


e Output processing—rjedisp dispatches job output messages 
according to the options specified on the “usr=" card. A 
normal output message indicates the returned file name is 
ready. 


Messages can be masked by using the /eve/ on the “usr=" card. 


Whenever output is to be handled by shger, rjedisp checks that 
shqer is running. This is done by looking for the shqer /og file. If this 
file does not exist, rjedisp starts shqer. 
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9.4.8 Shqer 


This program executes user programs when they appear in the place field 
of the “usr=” line in a returned output file (print or punch). Shgqer is started 
by rjedisp when the first output file using this feature is returned. 
Subsequent files using this feature are logged for execution by rjedisp. 
When started, shger performs the following processing: 


1. Builds the /og file from file names in /usr/rje/sque directory. Each log 
entry is the name of a file (tmp?) that contains the following 
information: 


e Name of file to be executed 

e Name of input file (file returned from IBM) 
e Name of IBM job 

e Programmer's name 

e IBM job number 

e User's name from “usr=" line 

e User's login directory 

e Minimum file system space. 


2. Shger uses two parameters. The first is the delay time between /og 
file reads. The second is a nice(2) factor which is applied to any 
programs spawned by shqer. These values are defined in 
/usr/include/rje.h (QDELAY and QNICE). 


3. When each log entry is read, the appropriate program is spawned with 
the following characteristics: 


e The returned RJE file is standard input to the program. 
e The standard and diagnostic outputs are /dev/null. 


e The LOGNAME and HOME variables are set to appropriate 
values. 


e The TZ and PATH variables are set to the following default 
values: 


TZ=EST5EDT | 
PATH =/bin:/usr/bin 


lf different values are desired for these variables, then the - 


desired values should be set using keyword parameters at the 
time that the RJE subsystem is started. For example: 
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PATH=:/bin su rje -c "/usr/rje1/rje1load deviced" 


will start the rje1 subsystem and the PATH variable passed to 
programs invoked by shgqer will be sent to PATH=:/bin. 


e The arguments to the spawned program, in order, are: 


a. A numerical value indicating that file system free space is 
equal or above (0) or below (1) space blocks. 


b. IBM job name. 

c. Programmer's name. 
d. IBM job number. 

e. User's login name. 


4. After executing each program, the tmp? file and returned RJE file are 
removed. 


9.5 Utility Programs 
9.5.1 Snoop 


Snoop is the generic name of a program that can be used to trace the state 
of a VPM device and its associated communications line. Snoop depends 
on the trace(7) driver for its information. It reads trace entries from 
idevitrace and converts them into a readable form that is printed on the 
standard output. 


The usable name of snoop for a particular RJE subsystem is snoopN, 
where N is the minor device number of the VPM device. In our hypothetical 
system, vomd is used by the rje? subsystem; and vpm7 is used by the rje2 
subsystem. Therefore, /usr/rje1/snoop0O and /usr/rje2/snoop!1 are linked to 
fusrirje‘snoop. Each snoop prints trace entries for its associated VPM 
device. Trace entries are printed in the following form: 


sequence type information 
where: 


e sequence specifies the order of trace occurrences. It is a value 
between 0 and 99. 


e type specifies the action being traced (e.g., transfers, driver activity). 
e information describes data being transferred and driver activity. 


Refer to Figure 8-1 at the end of this chapter for the meaning of the trace 
types and associated information. 
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9.5.2 Rjestat 


This program is supplied as a user command. The program's three 
functions are to list all status messages received from the IBM host system 
pertaining to a particular job, to describe the status of the RJE subsystems, 
and to provide a remote IBM status console. The remainder of this part 
describes these three functions. 


9.5.2.1 Job Status 
When invoked 
rjestat -jhost jobname 


scans the "resp" file and outputs to the user all IBM status messages 
pertaining to the job "jobname". 


9.5.2.2 RJE Status 


When invoked, rjestat reports the status of the RJE subsystems. If remote 
system (“host”) names are specified, only those statuses are reported. The 
rjestat uses the following rules to report the status of a subsystem: 


e Rjestat prints the contents of the file stafus if it exists in the 
subsystem directory. This file can contain any message the 
administrator wishes to have printed when users use rjestat. 


e If the file dead exists in the subsystem’s directory, the subsystem is 
not operating and the reason is contained in the file. The rjestat 
reports that RJE to “host” is down and prints the contents of the 
dead file as the reason. 


e If the file stop exists in the subsystems directory, the rjehalt program 
has been used to inhibit that RUE subsystem. Rjestat reports that 
RJE to “host” has been stopped by the operator. 


e If neither the dead nor the stop file exists, rjestat reports that RJE to 
“host” is operating normally. 


Rjestat is supplied as the user's vehicle for checking the status of RJE. It is 
not meant to be an administrative tool; however, the reason for failure can 
be used to track the problem. 


9.5.2.3 Status Console 


To use rjestat as a status console, the -shost argument is used. Rjestat 
prints the status of the subsystem, then prompts with host: if the subsystem 
is up. Each console request is submitted to the RJE processes for 
transmission, and output is handled as specified. Rjestat checks the status 
prior to submitting each request and will tell the user to try later if the 
subsystem goes down. Rijestat allows the RJE or superuser logins to 
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submit other than display requests. For a complete description of how to 
use the status console features, see rjestat(1C). 


9.5.3 Cvt 


This program converts any subsystem’'s job/og file to readable form. The 
first line printed is the process group number of the subsystem processes. 
The remaining output consists of entries in the following form: 


file user-id records level 


where “file” is the name of the submitted file, “user-id” is the submitters 
user number, “records” is the number of card images, and “level” is the 
message level. The “records” and “level” fields are not used if the file name 
is co* (console request submitted by rjestat). 


9.6 RJE Accounting 


Each RJE subsystem will store accounting information in the acctlog file if it 
exists. It is the responsibility of the RJE administrator to create and 
maintain this file in the subsystem’s directory. Entries in this file describe 
RJE line use and are of the following form: 


day time file user records 


Each field is delimited by a tab character. The meanings of each field is as 
follows: 


1. Day—The day of occurrence in the form mm/dd. 
Time—The time of occurrence in the form hh:mm:ss. 


File—The name of the UNIX system file. The first two characters 
identify its type as follows: 


e rd/sq—the file was transmitted to the remote system. 
e pr—The print output file was received from the remote system. 


e pu—The punch output file was received from the remote 
system. 


4. User—The user ID of the user responsible for the transfer. 


5. Records—The number of records (card images) transferred for this 
file. 


Since acctlog data is not used by RJE, it should not be allowed to grow too 
large. This can be accomplished by moving or processing the file during a 
system reboot (i.e., in /etc/rc before the RJE subsystems are started). 


The following list describes some of the reports that could be generated 


from the acctiog data. Implementation of a program to produce accounting 


reports is the responsibility of the administrator. 
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e Periodic Reports—By using the “day” and “time” fields in the data, 


periodic usage reports can be produced. ‘ 7 


e By User Reports—By using the “user” field in the data, usage-by- 


user reports can be produced. 


e By Subsystem Reports—By using the /usr/rje/lines file information 
and each acctlog file, a usage-by-subsystem (or remote system) 
report can be produced. 


Other reports can be produced using the type of file, size of jobs, etc. 
9.7 Troubleshooting 


This part deals with RJE problems and some methods for resolving them. 
The topics discussed in this part are as follows: 


e Automatic Error Recovery 
e Manual Error Recovery 
e RJE Problems 
e VPM Problems 
e Trace Interpretation. 
9.7.1 Automatic Error Recovery 


RJE attempts to be self-sustaining with respect to its availability. In general, 
if problems occur on the communications line or the remote machine (e.g., a 
crash), RJE will continually try to restart itself (this action will be referred to 
as “reboot’). For example, if an RJE subsystem is started using rjeload but 
the IBM system is not available, a fatal error will occur. The process that 
detects this error (usually rjexmit or rjerecv) will reboot the subsystem by 
executing rjeinit with a + as its argument. When rjeinit detects a + 
argument, it waits 1 minute before attempting to bring up the subsystem. 


The rjehalt program can be used to prevent an RJE subsystem from 
rebooting itself when the remote system is not available for a known period 
of time. When the remote system is made available, the subsystem may be 
started in the normal way. 


9.7.2 Manual Error Recovery 


In order to manually recover from errors, one must know how to start and 
stop an RJE subsystem. There are two ways to start an RJE subsystem: 


e rje?7ioad—This program loads and starts the VPM script and 
executes rje7init. 


9-18 Sys5 UNIX 


REMOTE JOB ENTRY CHAPTER 9 


e rje?init—This program starts the rje? subsystem. In order to use 
this program, the VPM script must have been previously loaded and 
started. 


To stop the rje? subsystem, the rje?halt program should be executed. 
This stops the subsystem gracefully and will prevent a reboot. 


The rjeload program must be used to start RJE for the first time (after a 
UNIX system reboot). Subsequently, as long as the script is running, 
execution sequences of rjehalt and rjeinit will stop and start RJE. 


Manually starting and stopping RJE can be useful in tracking down 
problems. For example, if user jobs are not being submitted to the host 
machine, the following sequence can ease identification of the problem: 


1. Halt ailing subsystem. 


2. Start a snoop process in the background with its output redirected to a 
file. 


3. Restart subsystem. 
4. Scan snoop output to determine location of problem. 


The snoop program is the most useful software tool for identifying RJE 
problems. Its uses are described in the subpart “Trace Interpretation”. 


9.7.3 RJE Problems 


This part describes problems that can occur in an RJE subsystem. These 
problems generally occur when the subsystem has not been set up properly. 
The following is a list of things to check to ensure that an RJE subsystem 
has been set up properly. 


1. IBM description—The description of the remote UNIX operating 
system machine must be consistent with the description in the subpart 
“IBM Generation”. 


2. UNIX system description—The file /usr/rje/lines must be set up 
properly. The subpart “Sys5 UNIX Generation” describes this file in 
detail. 


3. VPM setup—The VPM software must be installed and the proper VPM 
and physical devices made. The permission modes of the VPM and 
physical devices must be set by the system administrator to allow read 
and write access by the RJE programs. Each VPM device must 
correspond to the proper physical device; see vpm(7). 


4. Free space—As a general rule, all file systems must have a 
reasonable amount of free space. File systems containing RJE 
subsystems must have sufficient free space to ensure proper RJE 
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operation. 


5. Directories—Each subsystem’s directory and the controlling directory 
should be checked for the following: 


e All needed files exist. 

e The proper prefix is on each applicable RJE program. 
e The link count is correct for files that are linked. 

e All file and directory modes are correct. 


6. Initialization—Peripherals information must be consistent on both 
systems. The line must be started on the IBM system, proper 
hardware connections made, etc. 


Problems with a subsystem are indicated by error messages. The rjeinit 
checks for obstacliés in bringing up RJE. If an obstacle is found, an error 
message indicating the obstacle is printed on the error output. If a problem 
is encountered during normal operation, the message is logged in the errlog 
file. This file, error messages, output from snoop, and the checklist above 
should be used to determine and fix any subsystem problems. Generally, if 
a subsystem is set up properly but will not operate, the problem is the way 
the VPM or KMC has been set up, the remote system, or the hardware. 


9.7.4 VPM Problems 


After installing the hardware and making the appropriate devices, all VPM 
software and devices must be made [see vpm(7)]. The program rjeload 
links the devices to be used by the corresponding RJE subsystem. 


The following is a list of items to check when problems occur: 


1. Proper hardware—The appropriate hardware must be installed. Be 
sure the device is properly described to the system and passes 
diagnostics. 


2. Proper devices—The major and minor device numbers for the physical 
device and VPM devices must be correct. It should also be verified 
that rjeload program is called with the correct physical device names. 


3. Script runs—Verify the VPM script is able to run. This is done by 
tracing the proper device with the proper snoop program. Snoop will 
print “started” entries for both the physical device and VPM script. If 
no output appears from snoop when rjeload is executed, either the 
hardware is not working properly or the hardware or VPM has not 
been set up properly. If trace information is output for a period of time 
by snoop and the output abruptly stops, a modem problem should be 
suspected. That is, if the RJE cable is disconnected from its 
associated modem or the modem is not powered up and optioned 
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properly, then snoop output will "freeze" when the physical device 
attempts to transmit over the RJE link. Output of any other type from 
snoop should indicate where the problem is occurring. 


9.7.5 Trace Interpretation 


This part describes how to interpret trace output from the snoop program 
and gives several examples. 


Lines with type TR are traces from the VPM script. All others are driver 
traces and indicate the following: 


e CL—Activity occurring when the device has been closed. 
e OP—Activity occurring when the device has been opened. 
e RD—Read from device occurred. 

e WR—Write to device occurred. 

e ST—Start or stop activity. 

e SC—Script termination type, termination value is given. 


Figure 8-1 at the end of this chapter enumerates all possible trace lines for 
each type and describes the event. The remainder of this part consists of 
example trace output and its interpretation. Comments describing events 
will appear after the “*" in trace output. If more than one VPM were 
running, sequence numbers might not appear in order. For clarity, example 


sequences will be in order. 
9.7.5.1 Normal RJE Startup 


The following is an example of trace output when RJE has been started up. 
In this case, the remote machine responds to the enquiry byte (ENQ). The 
RJE subsystem signs on to the machine then follows the handshaking 
protocol [exchanging acknowledges (ACKs)]. 


Tracingvpm0 


0 ST Startack * Physical device started 
1 TR Started * Script started 

2 ST Start * VPM Driver start 

3 OP Opened * VPM Device open 

4 WR 84 bytes * Signon record written 
5 TR S-ENQ * Enquiry byte sent 

6 TR R-ACK * Received acknowledgment 
7 TR S-BLK * Sent signon block 

8 TR R-ACK * Block acknowledged 

9 TR S-ACK * Handshaking 

10 TR R-ACK : : 

11 TR S-ACK : 

12 TR R-ACK . 


13 TR S-ACK 
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14 TR R-ACK ' 
15 TR S-ACK ‘ 

16 TR R-ACK ‘ 

17 TR S-ACK * Handshaking 


If any jobs had been submitted via the send command or jobs were 
waiting to be returned, the traces would reflect the transfers rather than 
handshaking. 


9.7.5.2 RJE Startup—IBM not responding 


This example shows trace output when RJE has been started but does not 
receive a response from the remote machine. In general, the RJE script will 
time-out if a response is not received from the remote machine within 3 
seconds of the last transmission. When a time-out is detected while starting 
up, the ENQ is retransmitted. This is repeated six times before the script 
gives up. Other time-out responses will be discussed later. 


Tracing vpm0 


86 ST Startack * Physical device started 
87 TR Started * Script started 

88 ST Start * VPM Driver start 

89 OP Opened * VPM device open 

90 WR 84 bytes | * Signon record written 
91 TR | S-ENQ : * Enquiry byte sent 

92 TR TIMEOUT * No response to enquiry 
93 TR S-ENQ * Enquiry byte sent 

94 TR TIMEOUT * No response 

95 TR S-ENQ * Enquiry byte sent 

96 TR TIMEOUT * No response 

97 TR S-ENQ * Enquiry byte sent 

98 | TR — TIMEOUT * No response 

99 TR S-ENQ * Enquiry byte sent 

0 TR TIMEOUT * No response 

1 TR S-ENQ * Enquiry byte sent 

2 TR TIMEOUT * No response 

3 RD 1 bytes * 1-byte read (error) 

4 ST Stopchk * Safety check 

5 ST Stopack(0 * Script termination normal 
6 CL Clean * Cleanup done 

7 ST Stopped ** VPM script stopped 

8 CL Closed * VPM device closed 


The above sequence will be repeated approximately every minute until a 
positive response is received from the host. During that minute, the RJE 
subsystem is dormant; and the rjestat command will report that IBM is not 
responding. When this occurs, either the IBM machine is not available, 
down, line not started, etc., or there is a communications problem 
somewhere from where the physical device transmits data to where it 
receives data. The RJE administrator should first verify that the IBM 
machine is up, and the communications line has been started. If so, a 
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hardware trace of the communications line should be done to aid in 
detecting the problem. 


9.7.5.3 Transmitting and Receiving 


This example shows trace output from the start of job transmission through 


its return. For simplicity, only one job is being transmitted and returned. 


Tracing vpm0 


94 TR R-ACK * Handshaking 

95 TR S-ACK y 

96 TR R-ACK a 

97 TR S-ACK * Handshaking 

98 WR 4 bytes * Open reader request written 

99 TR R-ACK * Handshaking 

0 TR S-BLK * Sent open request block 

1 TR R-OKBLK * Received block (grant) 

2 TR S-ACK * Block acknowledged 

3 RD 7 bytes * Read seven bytes (grant) 

4 TR R-ACK * Handshaking 

5 TR S-ACK * Handshaking 

6 WR 481 bytes * First block written 

7 WR 470 bytes * Second block written 

8 TR R-ACK * Handshaking 

9 TR S-BLK * First block sent 

10 TR R-ACK * Block acknowledged 

11 WR 470 bytes * Third block written 

12 TR S-BLK * Second block sent 

13 TR R-OKBLK * Received block (on reader msg) 

14 WR 470 bytes * Fourth block written 

15 RD 66 bytes * Read 66 bytes (on reader msg) 

16 TR S-BLK * Third block sent 

17 TR R-ACK * Block acknowledged 

18 WR 147 bytes * Fifth block written 

19 TR S-BLK * Fourth block sent 

20 TR R-ACK * Block acknowledged 

* More of the same 

93 TR R-ACK * Handshaking 

94 TR S-ACK * Handshaking 

95 TR R-OKBLK * Received block (request) 

96 TR S-ACK * Block acknowledged 

97 RD 7 bytes * Read open printer request 
98 TR R-ACK * Handshaking 

99 TR S-ACK . 

0 TR R-ACK * 

1 TR S-ACK ae 

2 TR R-ACK “ 

3 TR S-ACK * Handshaking 

4 WR 4 bytes * Printer grant written 

5 TR R-ACK * Handshaking 

6 TR S-BLK * Block sent (grant) 

7 TR R-OKBLK * First block received 
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8 TR S-ACK 

9 RD 64 bytes 
10 TR R-OKBLK 
11 TR S-ACK 

12 RD 505 bytes 
13 TR R-OKBLK 
14 TR S-ACK 
15 TR R-OKBLK 
16 TR S-ACK 
17 TR R-ACK 

18 TR S-ACK 
19 TR R-ACK 
20 TR S-ACK 

21 RD 470 bytes 
22 RD 494 bytes 
23 TR R-ACK 

24 TR S-ACK 
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* Block acknowledged 

* Read first block 

* Second block received 
* Block acknowledged 

* Read second block 

* Third block received 

* Block acknowledged 

* Fourth block received 
* Block acknowledged 

* Handshaking 


a 


* Handshaking 

* Read third block 
* Read fourth block 
* Handshaking 

* Handshaking 


* etc. 


x 


Requests and grants are part of the multileaving protocol. When jobs are 
being transmitted and received simultaneously, as in a busier RJE 
subsystem, much less handshaking is involved. Rather than acknowledging 
blocks with ACKs, the protocol allows a block to be returned (this implies 
acknowledgment of the received block). The following example shows trace 


output at a busy time: 


tracing vpm0 


45 TR 
46 TR 
47 WR 
48 RD 
49 TR 
50 RD 
51 WR 
52 TR 
53 TR 
54 TR 
55 WR 
56 RD 
57 TR 
58 WR 
59 RD 
60 TR 
61 TR 


R-OKBLK * Received block 
S-BLK * Sent block 

493 bytes ‘ 

496 bytes . 

R-OKBLK * Received block 
65 bytes r 

4 bytes , 

S-BLK * Sent block 
R-OKBLK * Received block 
S-BLK * Sent block 

493 bytes ° 

7 bytes 

R-OKBLK * Received block 
493 bytes 

496 bytes 

S-BLK * Sent block 
R-OKBLK * Received block 


Notice that since there is work to be done on both sides acknowledgments 


are implied. 
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9.7.5.4 Trace Output Indicating Performance Problems 


Trace output is useful in detecting performance problems on the remote IBM 
system or on the local UNIX system. 


The first example shows activity resulting from time-outs occurring during 
normal operation. These time-outs were caused because the remote JES3 
system has performance problems, and occasionally, does not respond in 


the required 3 seconds. 


Tracing vpm0 


27 TR S-ACK *Handshaking 

28 TR R-ACK ss 

29 TR S-ACK a 

30 TR TIMEOUT *Noresponse 

31 TR S-NAK *Notacknowledged 
32 TR TIMEOUT *Noresponse 

33 TR S-NAK *Notacknowledged 
34 TR R-ACK *Response 

35 TR S-ACK *Handshaking 

36 TR R-ACK 

54 TR R-ACK . ; 

55 TR S-ACK * Handshaking 

56 TR TIMEOUT * No response 

57 TR S-NAK * Not acknowledged 
58 TR R-ACK * Response 

59 S-ACK * Handshaking 


The response to these time-outs are NAKs (not acknowledged). RJE will 
respond this way up to six times before giving up and attempting a reboot. 
At this time, rjestat would report that there are “Line Errors”. NAK is a 
request to retransmit the previous response. 


In the second example, time-outs occur because the local UNIX system has 
performance problems. When RJE is run on a heavily-loaded UNIX system, 
the RJE script occasionally pauses for a short period before sending 
acknowledgment messages to the remote host. Each pause serves to 
throttle the rate at which data may pass from the remote system to the 
UNIX system. Unfortunately, on a severely overloaded UNIX system this 
mechanism for controlling data flow cannot guarantee proper RJE operation. 
Time-outs result from the UNIX system's inability to respond to the remote 
system in the required 3 seconds. 


x 


® 


: UNIX system heavily ioaded - 
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R-OKBLK 
S-ACK 
R-OKBLK 
505 bytes 
S-ACK 
R-OKBLK 
S-ACK 
R-OKBLK 
PAUSE-ACK 
S-ACK 
475 bytes 
466 bytes 
R-OKBLK 


S-ACK 
R-OKBLK 
S-ACK 
R-OKBLK 
S-ACK 
R-OKBLK 
PAUSE-ACK 
405 bytes 
503 bytes 
473 bytes 
S-ACK 
TIMEOUT 
S-NAK 
R-OKBLK 
S-ACK 
417 bytes 
R-OKBLK 
S-ACK 


REMOTE JOB ENTRY 


* Time-outs successfully avoided 
* by pausing before sending 
* acknowledgments to remote system 


Ld 


* Previous block acknowledged 
* Received block 

* Block acknowledged 

* Received block 


* Block acknowledged 

* Received block 

* Block acknowledged 

* Received block . 
* Script pauses before acknowledging 
* Block acknowledged 


* 


* Received block 


+ 2 2 2&8 *®» 0% 8% & 
. . ® s e 


* 


* UNIX system severely overloaded - 


* Time-outs cannot be totally 


* prevented by pausing before 
* sending acknowledgments 


* 


* Previous block acknowledged 

* Block received 

* Block acknowledged 

* Block received 

* Block acknowledged 

* Block received 

* Script pauses before acknowledging 


x 


* Block acknowledged (but not in 3 sec) 
* No response (ACK was sent too late) 
* Attempt to recover 

* Block received 

* Block acknowledged 


* Block received 
* Block acknowledged 


x 
s 


* 
e 
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In such instances, the reasons for the overloading of the UNIX system 
should be investigated and remedied. If system overloading cannot be 
prevented, then RJE operation should be limited to those time periods when 
there is less contention for system resources. 


9.7.5.5 Communication Line Errors 


This example shows trace output from an RJE subsystem that uses a dial- 
up connection. The phone line is noisy and is prone to dropping. 


Tracing vpm0 


63 TR S-ACK * Handshaking 
«64 TR R-ACK Be 
65 TR S-ACK ye 
66 TR R-JUNK * Noise on the line 
67 TR S-NAK * Not acknowledged 
68 TR R-ACK * Recovery 
69 TR S-ACK . 
70 TR R-ACK . 
71 TR S-ACK : 
72 TR R-JUNK * Noise on the line 
73 TR S-NAK * Attempting to recover 
74 TR R-JUNK &. 
75 TR S-NAK ‘ 
76 TR R-JUNK 
77 TR S-NAK oie 
78 TR R-JUNK - 
79 TR S-NAK . 
80 TR R-JUNK : 
81 TR S-NAK ' 
82 TR R-JUNK ms 
83 RD 1 bytes * 1-byte read (error) 
84 ST Stopack(0) * Script termination normal 
85 CL Clean * Cleanup 
86 ST Stopped * VPM script stopped 
87 CL Closed * VPM device closed 


The error read in the above sequence causes RJE to reboot and rjestat to 
report line errors. If this type of problem were to occur frequently, the RJE 
link should be tested and the hardware connections to the link examined 
and replaced if necessary. 


9.7.5.6 Error Responses 


As seen in the parts above, the response to most errors is to send a NAK. 
The only exception is when starting up. Whenever a NAK is received on 
either side, it indicates that the previous transmission was not properly 
received. This should be followed by retransmission of the previous data. 
Generally, NAKs should not occur frequently and should be followed by 
recovery. If errors occur frequently or NAKs do not cause recovery, the line 
should be checked for problems. 
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On some IBM systems (e.g., JES2), an I/O error is printed at the system 
console whenever a NAK is received. These I/O errors can also be helpful 
in detecting the problem; however, they will not be discussed here as they 
vary with the system. It is assumed that someone in IBM support can assist 
if needed. | 


CL Closed - The virtual protocol machine (VPM) 

| device has been closed. 

CL Clean The VPM driver is cleaning up for this 
device. 

OP Opened The VPM has been successfully 
opened. 

OP Failed(open) The open failed because the device 
was already open. 

OP Failed(dev) The open failed because the device 
number was out of range. 

OP Failed(set) The open failed because the physical 
device could not be reset. 

RR Buf The VPM script has returned a 
receive buffer to the VPM driver. 

AX Buf The VPM script has returned a 
transmit buffer to the VPM driver. 

RD num bytes Num bytes were read from the VPM 
device by rjerecv. 

SC Exit(num) The VPM script has terminated. The 
VPM exit code is num. Exit codes are 
defined in vom(7). 

ST Startup The physical device has been started. 

ST Stopped. The VPM script has been stopped. 

TR Started The script has started tracing. 

TR R-ACK A 2-byte acknowledgment (ACK) 
string has been received from the 
remote system. This indicates that 
the previous’ transmission was 
properly received. | 

TR S-ACK A 2-byte ACK string has been 
transmitted to the remote system. 
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TR — RENAK 


TR S-NAK 

TR R-ENQ 

TR S-ENQ 

TR R-WAIT 
TR R-OKBLK 
TR R-ERRBLK 
TR _ R-SEQERR 
TR R-JUNK 
TR TIMEOUT 
TR S-BLK 

TR PAUSE-ACK 
WR num bytes 
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A “not-acknowledged” (NAK) 
character has been received from the 
remote system. This indicates that 
the previous transmission was not 
properly received. 


ANAK character has_ been 
transmitted to the remote system. 


An enquiry (ENQ) character has been 
received from the remote system. 


An ENQ _ character has been 
transmitted to the remote system. 


The remote machine has requested 
that no data be transmitted to it. 


A valid data block was received from 
the remote machine. 


An invalid cyclic redundancy check 
(CRC) was received with a data 
block. 


The block sequence count on a 
received data block was invalid. 


An invalid data block was received 
from the remote system. 


The remote machine did not respond 
within 3 seconds. 


A data block has been transmitted to 
the remote system. 


The script has paused prior to 
sending an acknowledgment string to 
the remote system. 


Num bytes were written to the VPM 
device by rjexmit. 
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10. SYSTEM ACTIVITY PACKAGE 


This chapter describes the design and implementation of the UNIX System 
Activity Package. The UNIX operating system contains a number of 
counters that are incremented as various system actions occur. The system 
activity package reports UNIX system-wide measurements including central 
processing unit (CPU) utilization, disk and tape input/output (I/O) activities, 
terminal device activity, buffer usage, system calls, system switching and 
swapping, file-access activity, queue activity, and message and semaphore 
activities. 


Throughout this chapter, each reference of the form name(1M), name(7), or 
name(8) refers to entries in the Sys5 UNIX Administrator Reference 
Manual. References to entries of the form name(N), where "N" is the 
number 1 or 6 possibly followed by a letter, refer to entry name in section N 
of the Sys5 UNIX User Reference Manual. \f "N" is a number 2 through 5 
possibly followed by a letter, refer to entry name in section N of the Sys5 
UNIX Programmer Reference Manual. 


The package provides four commands that generate various types of 
reports. Procedures that automatically generate daily reports are also 
included. The five functions of the activity package are: 


e sar(1) command—allows a user to generate system activity reports in 
real-time and to save system activities in a file for later usage. 


e sag(1G) command—displays system activity in a graphical form. 


e sadp(1M) command—samples disk activity once every second during 
a specified time interval and reports disk usage and seek distance in 
either tabular or histogram form. 


e timex(1)—a modified time(1) command that times a command and 
also (optionally) reports concurrent system activity and process 
accounting activity. 


e system activity daily reports—procedures are provided for sampling 
and saving system activities in a data file periodically and for 
generating the daily report from the data file. 


The system activity information reported by this package is derived from a 
set of system counters located in the operation system kernel. These 
system counters are described in the part “System Activity Counters”. The 
part “System Activity Commands” describes the commands provided by this 
package. The procedure for generating daily reports is given in “Daily 
Report Generation”. For a description of the files used by the system 
activity package, see Attachment 10-1 at the end of this chapter. 
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10.1 System Activity Counters 


The UNIX operating system manages a number of counters that record 
various activities and provide the basis for the system activity reporting 
system. The data structure for most of these counters is defined in the 
sysinfo structure in /usr/include/sys/sysinfo.h (see Attachment 10-2 at the 
end of this chapter). The system table overflow counters are kept in the 
_syserr structure. The device activity counters are extracted from the 
device status tables. In this version, the I/O activity of the following devices 
is recorded: RPO6, RMO5, RS04, RF11, RKOS, RPO3, RLO2, TMO3, and 
T™ 11. 


The following paragraphs describe the system activity counters sampled a 
the system activity package. 


Cpu time counters—There are four time counters that may be incremented 
at each clock interrupt 60 times per second. According to the mode the 
CPU is in at the interrupt (idle, user, kernel, and wait for I/O completion), 
exactly one of the cpu/] counters is incremented. 


Lread and Iwrite—The /read and /write counters are used to count logical 
read and write requests issued by the system to block devices. 


Bread and bwrite—The bread and Owrite counters are used to count the 
number of times data is transferred between the system buffers and the 
block devices. These actual I/Os are triggered by logical !/Os that cannot be 
satisfied by the current contents of the buffers. The ratio of block 1/O to 
logical 1/O is a common measure of the effectiveness of the system 
buffering. 


Phread and phwrite—The phread and phwrite counters count read and 
write requests issued by the system to raw devices. 


Swapin and swapout—The swapin and swapout counters are incremented 
for each system request initiating a transfer from or to the swap device. 
More than one request is usually involved in bringing a process in to or out 
of memory because text and data are handled separately. Frequently used 
programs are kept on the swap device and are swapped in rather than 
loaded from the file system. The swapin counter reflects these initial 
loading operations as well as resumptions of activity, while the swapout 
counter reveals the level of actual “swapping.” The amount of data 
transferred between the swap device and memory are measured in blocks 
and counted by Dswapin and bswapout. 


Pswitch and syscall—These counters are related to the management of 
multiprogramming. Syscaill is incremented every time a system call is 
invoked. The numbers of invocations of read(2), write(2), fork(2), and 
exec(2) system calls are kept in counters sysread, syswrite, sysfork, and 


10-2 | Sys5 UNIX 


SYSTEM ACTIVITY PACKAGE CHAPTER 10 


sysexec, respectively. Pswitch counts the times the switcher was invoked, 
which occurs when: 


1. A system call resulted in a road block 
2. An interrupt occurred resulting in awakening a higher priority process 
3. A 1 second clock interrupt occurs. 


Iget, namei, and dirblk—These counters apply to file-access operations. 
Iget and namei, in particular, are the names of UNIX operating system 
routines. The counters record the number of times the respective routines 
are called. Namei is the routine that performs file system path searches. It 
searches the various directory files to get the associated i-number of a file 
corresponding to a special path. /get is a routine called to locate the inode 
entry of a file (i-number). It first searches the in-core inode table. If the 
inode entry is not in the table, routine ‘get will get the inode from the file 
system where the file resides and make an entry in the in-core inode table 
for the file. /get returns a pointer to this entry. Namei calls iget, but other 
file access routines also call iget. Therefore, counter iget is always greater 
than counter namei. 


Counter dirbik records the number of directory block reads issued by the 
system. It is noted that the directory blocks read divided by the number of 
namei Calls estimates the average path length of files. 


Runque, runocc, swpque, and swpocc—These counters are used to 
record queue activities. They are implemented in the clock.c routine. At 
every 1 second interval, the clock routine examines the process table to see 
whether any processes are in core and in ready state. If so, the counter 
runocc is incremented and the number of such processes are added to 
counter runque. While examining the process table, the clock routine also 
checks whether any processes in the swap device are in ready state. The 
counter swpocc is incremented if the swap queue is occupied, and the 
number of processes in swap queue is added to counter swpque. 


Readch and writech—The readch and writech counters record the total 
number of bytes (characters) transferred by the read and write system 
Calls, respectively. 


Monitoring terminal device activities—There are six counters monitoring 
terminal device activities. Acvint, xmtint, and mdmint are counters 
measuring hardware interrupt occurrences for receiver, transmitter, and 
modem individually. Rawch, canch, and outch count number of characters 
in the raw queue, canonical queue, and output queue. Characters 
generated by devices operating in the cooked mode, such as terminals, are 
counted in both rawch and (as edited) in canch; but characters from raw 
devices, such as communication processors, are counted only in rawch. 
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Msg and sema counters—These counters record message sending and 
receiving activities and semaphore operations, respectively. 


Monitoring 1/O activities—As to the |/O activity for a disk or tape device, 
four counters are kept for each disk or tape drive in the device status table. 
Counter jo_ops is incremented when an I/O operation has occurred on the 
device. It includes block I/O, swap 1/O, and physical I/O. /o_bcnt counts 
the amount of data transferred between the device and memory in 512-byte 
units. /o_act and jo_resp measure the active time and response time of a 
device in time ticks summed over all 1/O requests that have completed for 
each device. The device active time includes the device seeking, rotating, 
and data transferring times, while the response time of an I/O operation is 
from the time the I/O request is queued to the device to the time when the 
I/O completes. | 


Inodeovf, fileovf, textovf, and procovf—These counters are extracted 
from _syserr structure. When an overflow occurs in any of the inode, file, 
text, and process tables, the corresponding overflow counter is incremented. 


10.2 System Activity Commands 


The system activity package provides three commands for generating 
various system activity reports and one command for profiling disk activities. 
These tools facilitate observation of system activity during 


e Acontrolled stand-alone test of a large system 


e An uncontrolled run of a program to observe the operating 
environment 


e Normal production operation. 


Commands sar and sag permit the user to specify a sampling interval and 
number of intervals for examining system activity and then to display the 
observed level of activity in tabular or graphical form. The timex command 
reports the amount of system activity that occurred during the precise period 
of execution of a timed command. The sadp command allows the user to 
establish a sampling period during which access location and seek distance 
on specified disks are recorded and later displayed as a tabular summary or 
as a histogram. 


10.2.1 The “‘sar’”’ Command 
The sar command can be used in the following two ways: 


e When the frequency arguments t and n are specified, it invokes the 
data collection program sade to sample the system activity counters 
in the operating system every t seconds for n intervals and generates 
system activity reports in real-time. Generally, it is desirable to 
include the option to save the sampled data in a file for later 
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examination. The format of the data file is shown in sar(1M). In 
addition to the system counters, a time stamp is also included. It 
gives the time at which the sample was taken. 


e If no frequency arguments are supplied, it generates system activity 
reports for a specified time interval from an existing data file that was 
created by sar at an earlier time. 


A convenient usage is to run sar as a background process saving its 
samples in a temporary file but sending its standard output to /dev/null. 
Then an experiment is conducted after which the system activity is extracted 
from the temporary file. The sar(1) manual entry describes the usage and 
lists various types of reports. Attachment 10-3 (at the end of this chapter) 
gives the formula for deriving each reported item. 


10.2.2 The “sag’’ Command 


Sag displays system activity data graphically. It relies on the data file 
produced by a prior run of sar after which any column of data or the 
combination of columns of data of the sar report can be plotted. A fairly 
simple but powerful command syntax allows the specification of cross plots 
or time plots. Data items are selected using the sar column header names. 
The sar(1G) manual entry describes its options and usage. The system 
activity graphical program invokes graphics(1G) and tplot(1G) commands 
to have the graphical output displayed on any of the terminal types 
supported by tplot. 


10.2.3 The ‘‘timex’’ Command 


The timex command is an extension of the time(1) command. Without 
options, timex behaves like time. In addition to giving the time information, 
it can also print a system activity report and a process accounting report. 
For all the options available, refer to the manual entry timex(1). It should be 
emphasized that the user and sys times reported in the second and third 
lines are for the measured process itself including all its children while the 
remaining data (including the cpu user % and cpu sys % ) are for the entire 
system. 


While the normal use of timex will probably be to measure a single 
command, multiple commands can also be timed either by combining them 
in an executable file and timing it or by typing: 


timex sh —c "cmd; cmd2; ... ;" 
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This establishes the necessary parent-child relationships to correctly extract 
the user and system times consumed by cmd1, cmd2, ... (and the shell). 


10.2.4 The “sadp’”’ Command 


Sadp is a user level program that can be invoked independently by any 
user. It requires no storage or extra code in the operating system and 
allows the user to specify the disks to be monitored. The program is 
reawakened every second, reads system tables from /dev/kmem, and 
extracts the required information. Because of the 1 second sampling, only a 
small fraction of disk requests are observed; however, comparative studies 
have shown that the statistical determination of disk locality is adequate 
when sufficient samples are collected. 


In the operating system, there is an fobuf for each disk drive. It contains 
two pointers which are head and tail of the I/O active queue for the device. 
The actual requests in the queue may be found in three buffer header 
pools—system buffer headers for block 1/O requests, physical buffer 
headers for physical I/O requests, and swap buffer headers for swap I/O. 
Each buffer header has a forward pointer that points to the next request in 
the I/O active queue and a backward pointer that points to the previous 
request. 


Sadp snapshots the fobuf of the monitored device and the three buffer 
header pools once every second during the monitoring period. It then traces 
the requests in the |/O queue, records the disk access location, and seeks 
distance in buckets of 8-cylinder increments. At the end of monitoring 
period, it prints out the sampled data. The output of sadp can be used to 
balance load among disk drives and to rearrange the layout of a particular 
disk pack. The usage of this command is described in manual entry 
sadp(1M). | 


10.3 Daily Report Generation 


The previous part described the commands available to users to initiate 
activity observations. It is probably desirable for each installation to 
routinely monitor and record system activity in a standard way for historical 
analysis. This part describes the steps that a system administrator may 
follow to automatically produce a standard daily report of system activity. 


10.3.1 Facilities 


e sadc—The executable module of sadc.c (see Attachment 10-1 at the 
end of this chapter) which reads system counters from /dev/kmem 
and records them to a file. In addition, two frequency arguments are 
usually specified to indicate the sampling interval and number of 
samples to be taken. In case no frequency arguments are given, it 
writes a dummy record in the file to indicate a system restart. 
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e sai—The shell procedure that invokes sadc to write system counters 
in the daily data file /usr/adm/sadd where dd represents the day of 
the month. It may be invoked with sampling interval and iterations as 
arguments. 


e sa2—The shell procedure that invokes the sar command to generate 
daily report /usr/adm/sa/sardd from the daily data file 
/usriadm/sa/sadd. It also removes daily data files and report files 
after 7 days. The starting and ending times and all report options of 
sar are applicable to sa2. 


10.3.2 Suggested Operational Setup 


It is suggested that the cron(1M) control the normal data collection and 
report generation operations. For example, the sample entries in 
/usr/s pool/cron/crontab/sys: 


0 * * * 0,6 /usr/lib/sa/sa1 
0 18—7 * * 1—5 /uSr/lib/sa/sa1 
0 8-17 * * 1—5 /usr/lib/sa/sa1 1200 3 


would cause the data collection program sade to be invoked every hour on 
the hour. Moreover, depending on the arguments presented, it writes data 
to the data file one to three times at every 20 minutes. Therefore, under the 
control of cron(1M), the data file is written every 20 minutes between 8:00 
and 18:00 on weekdays and hourly at other times. 


Note that data samples are taken more frequently during prime time on 
weekdays to make them available for a finer and more detailed graphical 
display. It is suggested that sa1 be invoked hourly rather than invoking it 
once every day; this ensures that if the system crashes data collection will 
be resumed within an hour after the system is restarted. 


Because system activity counters restart from zero when the system is 
restarted, a special record is written on the data file to reflect this situation. 
This process is accomplished by invoking sade with no frequency 
arguments within /efc/rc when going to multiuser state: 


su adm —c "/uSr/lib/sa/sadc /usr/adm/sa/sa‘date + %d*" 


Cron(1M) also controls the invocation of sar to generate the daily report via 
shell procedure sa2. One may choose the time period the daily report is to 
cover and the groups of system activity to be reported. For instance, if: 


0 20 * * 1-5 /usr/lib/sa/sa2 —s 8:00 —e 18:00 -i 3600 —uybd 


is an entry in /usr/spool/cron/crontab/sys, cron will execute the sar 
command to generate daily reports from the daily data file at 20:00 on 


- weekdays. The daily report reports the CPU utilization, terminal device 


activity, buffer usage, and device activity every hour from 8:00 to 18:00. 
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In case of a shortage of the disk space or for any other reason, these data | 
files and report files can be removed by the superuser. The manual entry “~~ 
sar(1M) describes the daily report generation procedure. 
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ATTACHMENT 10-1 


The source files and shell programs of the system activity package are in 
directory /usr/src/cmd/sa. 


sa.h 


sadc.c 


sar.c 


saghdr.h 

saga.c & sagb.c 
sai.sh 

sa2.sh 


timex.c 


sadp.c 
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The system activity header file defines the structure 
of data file and device information for measured 
devices. It is included in sadc.c, Sar.c, and 
timex.c. 


The data collection program that accesses 
/dev/kmem to read the system activity counters and 
writes data either on standard output or on a binary 
data file. It is invoked by the sar command 
generating a real-time report. It is also invoked 
indirectly by entries in /usr/spool/cron/crontab/sys to 
collect system activity data. 


The report generation program invokes sadc to 
examine system activity data, generates reports in 
real-time, and saves the data to a file for later 
usage. It may also generate system activity reports 
from an existing data file. It is invoked indirectly by 
cron to generate daily reports. 


The header file for saga.c and sagb.c. It contains 
data structures and variables used by saga.c and 
sagb.c. 


The graph generation program that first invokes sar 
to format the data of a data file in a tabular form and 
then displays the sar data in graphical form. 


The shell procedure that invokes sadc to write data 
file records. It is activated by entries in 
/usr/spool/cronicrontab/sys. . 


The shell procedure that invokes sar to generate the 
report. It also removes the daily data files and daily 
report files after a week. It is activated by an entry 
in /usr/spool/cron/crontab/sys on weekdays. 


The program that times a command and generates a 


system activity or process accounting report. 


The program that samples and reports’ disk 
activities. 
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struct sysinfo 


#define 
#define 
#define 
#define 


#define 
#define 
#define 


SYSTEM ACTIVITY PACKAGE 


ATTACHMENT 10-2 


{ 

time_t 
CPU_IDLE 
CPU_USER 
CPU_KERNAL 
CPU_WAIT 
time_t 
W_IO 
W_SWAP 
W_PIO 
long 

long 

long 

long 

long 

long 

long 

long 

long 

long 

long 

long 

long 

long 

long 

long | 
long 

long 

long 

long 

long 

long 

long 

long 

long 

long 

long 

long 

long 

long 

long 

long 
long 


wait[3]; 
0 
1 


“2 


bread; 
bwrite; 
lread; 
Iwrite; 
phread; 
phwrite; 
swapin; 
swapout; 
bswapin; 
bswapout: 
pswitch; 
syscall; 
sysread; 
syswrite; 
sysfork; 
SySexec; 


_runque: 


runocc; 
swpque;: 
SWPOCC; 
iget; 
namei; 
dirbik; 
readch; 
writech; 
revint: 
xmtint; 
mdmint; 
rawch; 
canch; 
outch:; 
msg; 
sema:; 
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ATTACHMENT 10-3 


The derivation of the reported items is given in this attachment. Each item 
discussed below is the data difference sampled at two distinct times t2 and 
t7. 


CPU Utilization 

%-Of-cpu-x = cpu-x / (cpu-idle + Cpu-user + cpu-kernel + cpu-wait) * 100 
where cpu-x is cpu-idle, cpu-user, cpu-kernel (cpu-sys), or cpu-wait. 
Cache Hit Ratio 

%-of-cache-l/O = (logical-I/O — block-l/O) / logical-l/O * 100 

where cache |/O is cache read or cache write. 

Disk or Tape I/O Activity 

%-of-busy = |/O-active / (t2 — t1) « 100; 

avg-queue-length = |/O-resp / |/O-active; 

avg-wait = (l/O-resp — |/O-active) / I/O-ops; 

avg-service-time = |/O-active / |/O-ops. 

Queue Activity 

avg-x-queue-length = x-queue / x-queue-occupied-time; 
%-of-x-queue-occupied-time = x-queue-occupied-time / (t2 — t1); 

where x-queue is run queue or swap queue. 

The Rest of System Activity 

avg-rate-of-x = x / (t2 — t1) 

where x is swap in/out, blks swapped in/out, terminal device activities, 
read/write characters, block read/write, logical read/write, process switch, 


system calls, read/write, fork/exec, iget, namei, directory blocks read, 
disk/tape |/O activities, message, or semaphore activities. 
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11. UUCP ADMINISTRATION 


This chapter describes how a uucp network is set up, the format of control 
files, and administrative procedures. Administrators should be familiar with 
the manual pages for each of the uucp related commands. 


11.1 Planning 


In setting up a network of UNIX systems, there are several considerations 
that should be taken into account before configuring each system on the 
network. The following parts attempt to outline the most important 
considerations. 


11.1.1 Extent of the Network 


Some basic decisions about access to processors in the network must be 
made before attempting to set up the configuration files. If an administrator 
has control over only one processor and an existing network is being joined, 
then the administrator must decide what level of access should be granted 
to other systems. The other members of the network must make a similar 
decision for the new system. The UNIX system password mechanism is 
used to grant access to other systems. The file /usr/lib/uucp/USERFILE 
restricts access by other systems to parts of the file system tree, and the file 
fusrilibluucp/L.sys on the local processor determines how many other 
systems on the network can be reached. 


When setting up more than one processor, the administrator has control of a 
larger portion of the network and can make more decisions about the setup. 
For example, the network can be set up as a private network where only 
those machines under the direct control of the administrator can access 
each other. Granting no access to machines outside the network can be 
done if security is paramount; however, this is usually impractical. Very 
limited access can be granted to outside machines by each of the systems 
on the private network. Alternatively, access to/from the outside world can | 
be confined to only one processor. This is frequently done to minimize the 
effort in keeping access information (passwords, phone numbers, login 
sequences, etc.) updated and to minimize the number of security holes for 
the private network. 


11.1.2 Hardware and Line Speeds 

There are only two supported means of interconnection by uucp(1), 
1. Direct connection using a null modem. 
2. Connection over the Direct Distance Dialing (DDD) network. 


In choosing hardware, the equipment used by other processors on the 
network must be considered. For example, if some systems on the network 
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have only 103-type (300-baud) data sets, then communication with them is 
not possible unless the local system has a 300-baud data set connected to 
a Calling unit. (Most data sets available on systems are 1200-baud.) If 
hard-wired connections are to be used between systems, then the distance 
between systems must be considered since a null modem cannot be used 
when the systems are separated by more than several hundred feet. The 
limit for communication at 9600-baud is about 800 to 1000 feet. However, 
the RS232 specification and Western Electric Support Groups only allow for 
less than 50 feet. Limited distance modems must be used beyond 50 feet 
as noise on the lines becomes a problem. 


11.1.3 Maintenance and Administration 


_ There is a minimum amount of maintenance that must be provided on each 
system to keep the access files updated, to ensure that the network is 
running properly, and to track down line problems. When more than one 
system is involved, the job becomes more difficult because there are more 
files to update and because users are much less patient when failures occur 
between machines that are under local control. 


11.2 UUCP Software 


Figure 10-1 (at the end of this chapter) is an illustration of the daemons 
used by the uucp network to communicate with another system. The 
uucp(1) or uux(1) command queues users requests and spawns the uucico 
daemon to call another system. Figure 10-2 (at the end of this chapter) 
illustrates the structure of uucico and the tasks that it performs in 
communicating with another system. Uucico initiates the call to another 
system and performs the file transfer. On the receiving side, uucico is 
invoked to receive the transfer. Remote execution jobs are actually done by 
transferring a command file to the remote system and invoking a daemon 
(uuxqt) to execute that command file and return the results. 


11.3 Installation 


The uucp(1) package is delivered as part of the standard UNIX system 
distribution. It resides in its own subdirectory (called uucp) in the 
commands area and has its own make file (uucp.mk). The uucp package is 
installed as part of the normal distribution; however, if it must be reinstalled 
for any reason, then the sequence 


make —f uucp.mk install 

should be executed. 

11.3.1 Object Modules 

The following object modules are installed as part of the uucp make 
procedure. 
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Uucp—tThe file transfer command. 

Uux—The remote execution command. 

Uucico—The uucp network daemon. 

Uustat—Network status command. 

Uuclean—Cleanup command. 

Uusub—The command for monitoring and creating a subnetwork. 


Uuxqt—The remote execution daemon. 


eh Oe (a OI 


Uudemon.day—A _ shell procedure that is invoked each day to 
maintain the network. Shell scripts for execution each week 
(uudemon.wk) and each hour (uudemon.hr) are also distributed. 


11.3.2 Password File 


To allow remote systems to call the local system, password entries must be 
made for any uucp logins. For example, 


nuucp:zaaAA‘6:1 :UUCP.Admin:/usr/spool/uucppublic:/usr/lib/uucp/uucico 


Note that the uucico daemon is used for the shell, and the spool directory is 
used as the working directory. 


There must also be an entry in the passwd file for an uucp administrative 
login. This login is the owner of all the uucp object and spooled data files 
and is usually "uucp". For example, the rome is a entry in /etc/passwd 
for this administrative login: 


uucp:zAvLCKp:5:1:UUCP.Admin:/usr/lib/uucp: 


Note that the standard shell is used instead of uucico. If an owner other 
than “uucp" is chosen, the make file for uucp (/usr/src/cmd/uucp:uucp.mk) 
must be edited. The line "OWNER=uucp” must be changed to reflect the 
new owner login. 


11.3.3 Lines File 


The file /usr/lib/uucp/L-devices contains the list of all lines that are directly 
connected to other systems or are available for calling other systems. The 
file contains the attributes of the lines and whether the line is a permanent 
connection or can call via a dialer. The format of the file is 


type line call-device speed protocol 


where each field is 
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type Two keywords are used to describe whether a line is < ~ 
directly connected to another system (DIR) or uses an « 
automatic calling unit (ACU). An X.25 permanent virtual 
circuit would use the DIR keyword. 


line This is the device name for the line (e.g., ftyab for a 
direct line, cu/O for a line connected to an ACU). 


call-device If the ACU keyword is specified, this field contains the 
device name of the ACU. Otherwise, the field is 
ignored; however, a placeholder must be used in this 
field so that the protocol field can be interpreted. 


speed The line speed that the connection is to run at. (The 
speed field is currently ignored if an X.25 link is used.) 


protocol This is an optional field that needs only be filled in if the 
connection is for a protocol other than the default 
terminal protocol. The X.25 protocol is the only other 
protocol supported and the single character x is used to 
select this protocol. 


The following entries illustrate various types of connections: 


DIR ttyab 0 9600 
ACU cul0 cua0 1200 
DIR x25.s0 0 300 x 


The first entry is for a hard-wired line running at 9600-baud between two 
systems. Note that the acu-device field is zero. The second entry is for a 
line with a 1200-baud ACU. The last entry is for an X.25 synchronous direct 
connection between systems. Note that the protocol field is filled in and that 
the acu-device and line speed fields are meaningless. 


11.3.3.1 Naming Conventions 


It is often useful when naming lines that are directly connected between 
systems or which are dedicated to calling other systems to choose a naming 
scheme that conveys the use of the line. In the earlier examples, the name 
ttyab is used for the line that directly connects two systems named a and b. 
Similarly, lines associated with calling units are best given names that relate 
them to the calling unit (note the names cu/0 and cua0 to specify the line 
and calling unit, respectively). 


11.3.4 System File 


Each entry in this file represents a system that can be called by the local 
uucp programs. More than one line may be present for a particular system. 
In this case, the additional lines represent alternative communication paths. 
that will be tried in sequential order. The fields are described below. as 
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system name Name of the remote system. 


time 


device 


class 


phone 


login 
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This is a string that indicates the days-of-week and 
times-of-day when the system should be called (e.g., 
MoTuThO800—1 730). 


The day portion may be a list containing Su, Mo, Tu, 
We, Th, Fr, Sa; or it may be Wk for any week-day or 
Any for any day. The time should be a range of times 
(e.g., 0800-1230). If no time portion is specified, any 
time of day is assumed to be allowed for the call. Note 
that a time range that spans 0000 is permitted; 0800- 
0600 means all times are allowed other than times 
between 6 and 8 am. An optional subfield is available 
to specify the minimum time (minutes) before a retry 
following a failed attempt. The subfield separator is a 
",” (e.g., Any,9 means call any time but wait at least 9 
minutes before retrying the call after a failure has 
occurred). 


This is either ACU or the hard-wired device name to be 
used for the call. For the hard-wired case, the last part 
of the special file name is used (e.g., tty0). 


This is usually the line speed for the call (e.g., 300). 


The phone number is made up of an optional alphabetic 
abbreviation (dialing prefix) and a numeric part. The 
abbreviation should be one that appears in the L- 
dialcodes file (e.g., mh1212, boston555—1212). For the 
hard-wired devices, this field contains the same string 
as used for the device field. 


The login information is given as a series of fields and 
subfields in the format 


[ expect send]... 


where expect is the string expected to be read and 
send is the string to be sent when the expect string is 
received. 


The expect field may be made up of subfields of the 
form 

expect[—send—expect] ... 

where the send is sent if the prior expect is not 
successfully read and the expect following the send is 
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the next expected string. (For example, login--login will 
expect /ogin; if it gets it, the program will go on to the 
next field; if it does not get fogin, it will send null 
followed by a new line, then expect login again.) If no 

_ characters are initially expected from the remote 
machine, the string "" (a null string) should be used in 
the first expect field. 


There are two special names available to be sent during 
the login sequence. The string EOT will send an EOT 
character, and the string BREAK will try to send a 
BREAK character. (The BREAK character is simulated 
using line speed changes and null characters and may 
not work on all devices and/or systems.) A number from 
1 to 9 may follow the BREAK (e.g., BREAK7, will send 
1 null character instead of the default of 3). Note that 
BREAK7 usually works best for 300-/1200-baud lines. 


There are several character strings that cause specific actions when they 
are a part of a string sent during the login sequence. 


\s Send a space character. 
\d Delay one second before sending or reading more characters. 
\c If at the end of a string, suppress the new-line that is normally 


sent. Ignored otherwise. 


\N Send a null character. 
These character strings are useful for making uucp communicate via direct 
lines to data switches: 


A typical entry in the L.sys file would be 
sys Any ACU 300 mh7654 login uucp ssword: word 


The expect algorithm matches all or part of the input ng as illustrated in 
the password field above. 


11.3.5 Dialing Prefixes 


This file contains the dial-code abbreviations used in the L.sys file (e.g., py, 
mh, boston). The entry format is 


abb dial-seq 


where abb is the abbreviation and dial-seq is the dial sequence to call that 
location. 


The line 
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py 165— 
would be set up so that entry py7777 would send 165-7777 to the dial unit. 
11.3.6 Userfile 


This file contains user accessibility information. It specifies four types of 
constraints: 


Files that can be accessed by a normal user of the local machine. 
Files that can be accessed from a remote computer. 


Login name used by a particular remote computer. 


a oy I 


Whether a remote computer should be called back in order to confirm 
its identity. 


Each line in the file has the format 


login,sys [c] pathname [pathname] ... 


where 
login is the login name for a user or the remote computer. 
sys is the system name for a remote computer. 
Cc is the optional call-back required flag. 


pathname _ is a pathname prefix that is acceptable for sys. 
The constraints are implemented as follows: 


1. When the program is obeying a command stored on the local machine, 
the pathnames allowed are those given on the first line in the 
USERFILE that has the login name of the user who entered the 
command. If no such line is found, the first line with a null login name 
is used. 


2. When the program is responding to a command from a remote 
machine, the pathnames allowed are those given on the first line in the 
file that has the system name that matches the remote machine. If no 
such line is found, the first one with a nulf system name is used. 


3. When a remote computer logs in, the login name that it uses must 
appear in the USERFILE. There may be several lines with the same 
login name but one of them must either have the name of the remote 
system or must contain a nu// system name. 


4. If the line matched in (3.) contains a ‘‘c’, the remote machine is called 
back before any transactions take place. 


The line 
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u,m > /usf/xyz 


allows machine m to login with name u and request the transfer of files 
whose names start with /usr/xyz. The line 


you, /usr/you 


allows the ordinary user you to issue commands for files whose name starts 
with /usr/you. (This type restriction is seldom used.) The lines 


u,m /ust/xyZ /ust/spool 
u, /usr/spool 


allows any remote machine to login with name u. If its system name is not 
m, it can only ask to transfer files whose names start with /usr/spool. If it is 
system m, it can send files from paths /usr/xyz as well as /usr/spoo/. The 
lines : 


root, / 
, /ust 


allow any user to transfer files beginning with /usr but the user with login 
root can transfer any file. (Note that any file that is to be transferred must 
be readable by anybody.) 


11.3.7 Forwarding File 


There are two files that allow restrictions to be placed on the forwarding 
mechanism. The format of the entries in each file is the same, 


system 
or 
system, user, usere,... 


The file OR/IGFILE (/usr/lib/uucp/ORIGFILE) restricts the access of systems 
that are attempting to forward through the local system. The file contains 
the list of systems (and users) for whom the local system is willing to 
forward. Each entry refers to the system that was the source of the original 
job and not the name of the last system to forward the file. The second file, 
FWDFILE (/usr/lib/uucp/FWDFILE), is a list of valid systems that a job can 
be forwarded to. (it is not necessarily the name of the destination of a job, 
but merely the next valid node.) This file will be a subset of the L.sys file and 
can be used to prevent forwarding to systems that are very expensive to 
reach but to which access by local users is allowed (e.g., links to overseas 
universities). If neither of these files exist, uucp will be perfectly happy to 
forward for any system. As an example, if the entry for system australia 
were in the OR/GFILE but not in the FWDFILE on system mhtsa, it would 
mean that system australia would be capable of forwarding jobs into the 
network via system mhtsa. However, no systems in the network could 
forward a job to australia via system mhtsa. 
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11.4 Administration 


The role of the uucp administrator depends heavily on the amount of traffic 
that enters or leaves a system and the quality of the connections that can 
be made to and from that system. For the average system, only a modest 
amount of traffic (100 to 200 files per day) pass through the system and little 
if any intervention with the uucp automatic cleanup functions is necessary. 
Systems that pass large numbers of files (200 to 10,000) may require more 
attention when problems occur. The following parts describe the routine 
administrative tasks that must be performed by the administrator or are 
automatically performed by the uucp package. The part on problems 
describes what are the most frequent problems and how to effectively deal 
with them. 


11.4.1 Cleanup 


The biggest problem in a dialup network like uucp is dealing with the 
backlog of jobs that cannot be transmitted to other systems. The following 
Cleanup activities should be routinely performed by shell scripts started from 
cron(1). 


11.4.1.1 Cleanup of Undeliverable Jobs 


The uudemon.day procedure usually contains an invocation of the uuciean 
command to purge any jobs that are older than some fixed time (usually 72 
hours). A similar procedure is usually used to purge any /ock or status files. 


- An example invocation of uuclean(1M) to remove both job files and old 


status files every 48 hours is: 
/usr/lib/uucp/uuclean —pST —pC —n48 
11.4.1.2 Cleanup of the Public Area 


In order to keep the local file system from overflowing when files are sent to 
the public area, the uudemon.day procedure is usually set up with a find 
command to remove any files that are older than 7 days. This interval may 
need to be shortened if there is not sufficient space to devote to the public 
area. 


11.4.1.3 Compaction of Log Files 


The files SYSLOG and LOGFILE that contain logging information are 
compacted daily (using the pack command from the shell script 
uudemon.day) and should be kept for 1 week before being overwritten. 


11.4.2 Polling Other Systems 


Systems that are passive members of the network must be polled by other 
systems in order for their files to be sent. This can be arranged by using the 
uusub(1) command as follows: 
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uusub —cmhtsd 
which will call mhtsd when it is invoked. 
11.4.3 Problems 


The following sections list the most frequent problems that appear on 
systems that make heavy use of uucp(1). 


11.4.3.1 Out of Space 


The file system used to spool incoming or outgoing jobs can run out of 
space and prevent jobs from being spawned or received from remote 
systems. The inability to receive jobs is the worse of the two conditions. 
When file space does become available, the system will be flooded with the 
backlog of traffic. 


11.4.3.2 Bad ACU and Modems 


The ACU and incoming modems occasionally cause problems that make it 
difficult to contact other systems or to receive files. These problems are 
usually readily identifiable since LOGFILE entries will usually point to the bad 
line. If a bad line is suspected, it is useful to use the cu(1) command to try 
calling another system using the suspected line. 


11.4.3.3 Administrative Problems 


Some uucp networks have so many members that it is difficult to keep track 
of changing passwords, changing phone numbers, or changing logins on 
remote systems. This can be a very costly problem since ACU’s will be tied 
up calling a system that cannot be reached. 


11.5 Debugging 


In order to verify that a system on the network can be contacted, the uucico 
daemon can be invoked from a user’s terminal directly. For example, to 
verify that mhtsd can be contacted, a job would be queued for that system 
as follows: 


uucp —r file mhtsd!"/tom 


The -r option forces the job to be queued but does not invoke the daemon 
to process the job. The uucico command can then be invoked directly: 


/ust/lib/uucp/uucico —r1 —x4 —smhtsd 


The -r1 option is necessary to indicate that the daemon is to start up in 
master mode (i.e., it is the calling system). The -x4 specifies the level of 
debugging that is to be printed. Higher levels of debugging can be printed 
(greater than 4) but requires familiarity with the internals of uucico. If 
several jobs are queued for the remote system, it is not possible to force 
uucico to send one particular job first. The contents of LOGFILE should 
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also be monitored for any error indications that it posts. Frequently, 
problems can be isolated by examining the entries in LOGFILE associated 
with a particular system. The file ERRLOG also contains error indications. 
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Figure 10-2. Uucp Network Daemon 
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